Amazon RDS Back Ups over VPC for Internal Network Storage

We have our hosting and database storage on the Amazon cloud.  This is a very strong, functional platform.  However like all remote systems, it can sometime be painful to get local copies of the data.  When your database size in calculated in the GB realm, you have very few options other than wait when making backups.

Amazons automated backups is a nice touch ensuring you always have a nightly version of your data in a mission critical case, but it also allows you the ability to be able to build a temporary RDS to export your data from, thus removing any impact on your production environments. This solution is great, but it is time consuming.  You need to build your RDS from the snap shot, change a bunch of settings to make it available to your client, perform a long winded mySQLdump to get your data out, then copy it back down across the wire.  That is time you can spend more wisely on other higher priority tasks.

So this is what we were tasked with.  Automate a backup of the data nightly from Amazons RDS through the leveraging of their CLI tools, and get it back into the office network to allow much faster access for local development setups.

Not long ago, we went through an entire Amazon Infrastructure rebuild implementing both a public and private VPC for our network in the office, this was clutch to make the following solution work.  Without this bridge in place you would still have to go through the drawn out process of copying your data back locally, or run the rusk of exposing file shares to a secured internal network to the world (which I highly suggest you don’t).

WHAT THIS WILL DO FOR YOU:

This utility will perform the following for you with no interaction required from yourself once it is configured and running.

  1. Find the most recent automatic backup for your RDS DB Instance Group by queuing the list of automatic snapshots, sorting them created date, and taking the most recent one.
  2. Mount this backup to a new RDS Instance
  3. Set the Security Group for the RDS Instance making it available to your SQL EXPORT server
  4. Set the DB Parameter Group for this instance to allow you to build custom functions through your sanitization script
  5. Export your database using mysqldump, and compressing it with gzip
  6. Copy this backup gzip to a mapped folder for an internal server within your corporate network
  7. Sanitize the database with the supplied SQL script
  8. Export and compress the sanitized version
  9. Copy the sanitized version to a different share point across your corporate network
  10. Destroy the RDS that was created.

ASSUMPTIONS:

What I will walk you through here below is the contents and scripts to make your lives MUCH simpler with you backups.  I am going to be writing this as if you have a somewhat sound understanding of Amazons Infrastructure and VPC configurations.  I will walk you through the steps needed to configure a working nightly backup that will also allow you to clean and sanitize any data that will be available to your developers.  Source for this can be found here (https://github.com/jbrodie/aws-backup) for your reference / usage to get this setup running for your day to day usage.

A couple things I am going to assume before we start this:

  1. You have an Amazon account setup and know how to get into it.
  2. You have set up a public and a private VPC in the infrastructure of Amazon.  If you haven’t done this yet, Amazon has great documentation on how to go about setting this up and configuring it from start to finish.  You can find their guide here (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html).
  3. You have your internal network setup within you private VPC.
  4. You have a database you want to export that has auto backups configured.
  5. You have created a “backup” user on your database that you wish to export with the following abilities:
    • show grants for ‘backup’@‘[host ip]’;
    • grant SELECT, UPDATE, DROP, LOCK TABLES, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, EXECUTE ON `[db_name]`.* TO ‘backup’@‘[host ip]’;

GETTING STARTED:

First off, lets get you setup and configured on an S3 instance for your exporting needs.  This doesn’t have to be a super large instance, although something with some power isn’t a bad move, I believe we went with a “t2.medium” and it seems to be able to handle the work load without too much of a problem.  Install the instance with an the standard Ubuntu 14.04 image available on Amazon.  This instance should exist in your PRIVATE VPC, without an PUBLIC IP address.  This box needs to be secure as it will be connecting to your internal network.

Next you will need to install a NAT TRANSLATION BOX into your public VPC and use this to allow the connection for updates from your PRIVATE SQL EXPORT box.  Amazon has a great walk through on this that can be located here (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html).  Once you have this box set up and installed with the recommended image, you can now use this to set the access for your PRVATE SQL EXPORT box.

Moving back to your SQL EXPORT box, you will need to install the following packages by performing the following commands:

  1. sudo apt-get update
  2. sudo apt-get install mysql-client-5.5
  3. sudo apt-get install cifs-utils
  4. sudo apt-get install awscli

You should also go ahead and create a .my.cnf file located in your home directory for the ubuntu account on the system.  In this file you will need to include your backup users credentials to allow the mysql commands to fire from the command line without having to enter your login credentials, as well as preventing you from having to have them in the command and committed to Github, because we all know username and password in Github is a bad thing.  The file should contain the following:

[client]
user=”backup_user_account_name”
password=“[password]”

Once you have these packages installed, you will need to configure your Amazon access with your Key ID and Secret Access Key ID.  Again, there is a great walk through on this located here: (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html).

When you have your keys set up you will need to enter the following to configure your Amazon CLI access.
aws configure

This will prompt you for the following information.  Enter the information from the credentials on your previous step.
AWS Access Key ID [None]: XXXX123456789XXXX
AWS Secret Access Key [None]: XXXXABCDEFGHIJXXXX
Default region name [None]: [Your hosting region for your VPC]
Default output format [None]: table

Now you will have access through the Amazon CLI to you instances.  You will be able to run the scripts now against Amazon and have the control you need.

You should not proceed to mapping your drive into your internal server.  You again should create a local credentials file in your home directory to avoid having to store these in Github.

Create a file named “.credentials” in your home directory and enter:

username=[Your network user with access to your shared drive for read and write]
password=[password]
domain=[Your domain]

This will allow us to target this file when attempting to map the network drive.  So in your home directory, create a folder names “RDSBackups”.  Once you have this created run the following command to map the directory to the share on the server you are wanting to copy your backups to.

sudo mount //[Server IP]/RDSBackups$ /home/ubuntu/RDSBackups -t cifs -o credentials=/home/ubuntu/.credentials,uid=1000,gid=1000

This will map the folder share for this session.  Ensure you can actually copy, create and delete files from this share.  Once that is confirmed you can move on to adding this record to your stab to ensure that this drive path is mapped after each reboot.

sudo pico /etc/fstab//[Server IP]/RDSBackups$ /home/ubuntu/RDSBackups cifs credentials=/home/ubuntu/.credentials,uid=1000,gid=1000

Reboot your server now and confirm that this mapping is actually working and remapping on a reboot.  This will be necessary if you ever move this process to OpsWorks to be a time based server to start and stop automatically.

TIP: If you are planning on doing a sanitized version of your database, then you will need to create another drive mapping to a different folder that you can make accessible to the developers, restricting access to the primary folder that will have the confidential data within it.  I would suggest a mapping for RDABackupDevs$ as a share point, then the corresponding folder on your SQL EXPORT machine mapped through the same user in the ‘stab’ file.

We can now go ahead and clone the repo from Github and start our configuration of the scripts to allow for the backups to auto backup.

git clone git@github.com:jbrodie/aws-backup.git rds-exporter

In this folder you will see several different partial included files as well as the primary file, which for this case we will copy and use ‘database_name.sh’ as our working template to set up your first export.

Copy over the ‘database_name.sh’ file and name it to match the db instance name for clarity that it will be actioning on. Example, if your RDS DB Instance is named ‘website’, then use this name. Once you have the file copied, be sure to open it and change the instance name in the top of the file accordingly:

export SNAPSHOT_GROUP_NAME=’website_production’

Be sure to update the mappings to the save folders for where you want to copy the exported data to on your local machine (which is the mapped folder to your server on your VPC).

mkdir -p ~/RDSBackups/${CURRENT_DATE}
export SAVE_PATH=~/RDSBackups/${CURRENT_DATE}
mkdir -p ~/RDSBackupDevs/${CURRENT_DATE}
export SAVE_PATH_DEVS=~/RDSBackupDevs/${CURRENT_DATE}

If you wish to create a sanitized version of the data for your developers, you can also copy over the ‘database_name.sql’ file in the sanitizer folder to and name it accordingly to match your DB Instance name, ‘website.sql’.  Within this file you can write your custom SQL to clean and sanitize the data for your developers, allowing you to ensure that secure information isn’t left out in the wild.

You will need to go and update the information in the ‘_shared_config.sh’ file before going forward.  This information is what will be used for setting up and assigning the proper settings to your RDS instance to allow the export to work properly.  Most of these should be self explanatory, by default all of these ARE REQUIRED unless you edit the source scripts to remove the option that are not applicable to you.

export CURRENT_DATE=$(date +”%Y_%m_%d”)
export VPC_SECURITY_GROUP=“”
export AVAILABILITY_ZONE=””
export DB_SUBNET_GROUP_NAME=””
export BACKUP_INSTANCE_NAME=”auto-export-$SNAPSHOT_GROUP_NAME”
export DB_PARAMETER_GROUP=””
export LOG_FILE=”$(dirname $0)/logs/${CURRENT_DATE}-${SNAPSHOT_GROUP_NAME}.log”

IMPORTANT: One of the issues I came across when I was building this was the fact that Amazon RDS doesn’t really allow user accounts without SUPER rights to be able to create FUNCTIONS within their databases.  This is a security thing, and it makes sense.  That being said with the process of sanitizing the data from the database before exporting it we need to ensure that the sensitive information is removed.  You will need to go into your Amazon Account -> RDS -> Parameter Group, and clone the existing parameter group you are using in your production environment.  Once you have this clones, edit the ‘log_bin_trust_function_creators’ and set this value to 1.  This will allow your users of the RDS instance on this parameter group the ability to be able to create the function we will need to sanitize the data.

You can now test run your setup by running the following command:

/home/ubuntu/rds-exporter/[db_name].sh

What you will see is the output in the console, as well as the contents will be logged into the /logs folder instance and time stamping your log files.

Once you have this running with success, go ahead and check the time on your instance for when it is set to do your auto backups. Depending on the size of your database, you can then schedule this script to run at an appropriate time through the crontab.

This is still a very much work in progress, any changes that I make I will do my best to ensure the code base and this article are updated to reflect the most current version.

Please feel free to submit any suggestions / comments below or on the github repository.

Info-Tech Innovation Days

Most teams and departments are so busy with the day-to-day running of the business that they never get the time to think about the bigger picture, but giving people the freedom to build their own solution changes they way they work, and creates better products for the business.

Innovation Days at Info-Tech changed the tone in many ways and was the first step in making our business see us as creative individuals who could deliver more than they were asking, rather than mere implementers of specs. Many of our innovation day projects have made significant contributions to our bottom line.

Innovation Day was originally inspired by the book “Drive” by Daniel Pink, he talks about FedEx days at Atlassian (so named because everything is “shipped” in one day). There is a great recap of this on YouTube drawn by the folks at RSA.

At Atlassian, the rules for FedEx days are:

“For this 24 hour period you can work on whatever you want, with whomever you want, wherever you want. All we ask is that you show the results to the company at the end of the 24 hours.”

It worked pretty well for them, so we have the same rules.

Why do we innovate?

Innovation Day is our chance to tap into our own internal resources and create things that inspire us. The only rule we have is that it should be something that in some way benefits Info-Tech. It’s an opportunity for the IT department to drive the rest of the business.

This is how we innovate:

We self-select into teams. What makes a team a team? You need to have a group of people who share a similar mindset. If you want to work on a data based project, you probably shouldn’t join up with an entirely print based team.  You probably need to have multi-disciplined people, and you’ll need someone comfortable with presenting.

Who can participate?

Everyone in IT will participate. Recently, when asked for a show of hands from people whether they prefer self-organized or pre-organized. Self organized won out, but we want to be considerate of everyone. Teams will self organize, if an individual wants to be placed on a team, they only have to email the organizer, and they will be taken care of.

If a team wants to include someone in a department other than IT to be on their team, the team will need to get permission from the person’s manager.

That’s it, at the end of the 24 hours, we get together as a department, have some beer, snacks, and perhaps a lovely fruit tray, invite some of the executive team over, and listen to the presenters.

Executives can sponsor any ideas that come out of Innovation Days, which will get dedicated time for work. While the department votes on what they think is the “best in show”.

The winners of Best in Show get their names immortalized on the Innovation Day trophy, and much rejoicing is had.

Top Ten Tips for a Successful Hackathon

Here’s Info-Tech’s Top 10 Tips for a successful hackathon.

  1. Get building faster, by using vendors and mentors. Vendors and their mentors offer a great opportunity for learning new skills, getting new perspectives, and sometimes even offering new tools (such as APIs) that can help you get your project up and running quickly. Info-Tech’s team of mentors includes Business Analysts, Design and UX experts, and experienced Developers that live and breathe product development day in and day out.
  2. Make something valuable by creating something YOU would want. Every reputable source on Innovation says that fully understanding your target user, their problems, and their use cases is essential to building a successful problem. One easy way to make this happen is to build a product for someone like you. Otherwise, recruit a few relevant target users to have available to bounce ideas off of to ensure the problem you are solving is compelling and valuable.
  3. Start from “The Compelling Pitch” to focus efforts on impactful work. A common pitfall of any hackathon is to assume the product will “speak for itself”. It won’t. When drawing up plans think through how the creativity, technical difficulty, polish, and usefulness of the project will be communicated, as this will determine the judges’ final score. Building out a fully featured product with elegant code is important in the real world, but in this short timeframe, it’s more important to be able to elegantly communicate the user experience, the primary design components, and most importantly the business value of the product. If you decide to continue building post-event, you can add all the bells and whistles later.
  4. Create a simple, valuable, working product by reigning in scope. Complexity is your enemy. When determining all the desired features of the product, organize them into product releases. What is the minimal viable product that you can build into a working product? Start with that. There’s nothing worse than having a fully functioning design that does nothing. In a typical organization, there are 3 things you can control on a project: Time, Resources and Scope. Unfortunately, you don’t have that luxury, so be aggressive in keeping the features to a maintainable number.
  5. Eliminate last minute failure by getting to production fast and iterating. Get your production environment up and running early. Code always runs differently on a local machine than it does in production. In addition, knowing that your have a working demo will keep stress levels down as deadline approaches. In the perfect world, the last few hours would be spent polishing design and ironing out bugs. But let’s be real, there’s always the temptation to do a little more in the way of features at the end, and this just isn’t possible if you haven’t yet promoted your code. Use a versioning tool, like GitHub, and deploy and smoke test regularly.
  6. Convincingly communicate your coolness, by practicing the perfect pitch and eliminating dependencies. Arguably the most important element of the project, this often gets left to the last minute. By thinking of this up front, you can start thinking about what should go in your pitch. When your main product components are completed start putting together your messaging, and practice your demo. Be careful not to rely on wifi, as there may be many people demoing at once.
  7. Here is a list of things to consider in your pitch:
    • The hook: Share a story or scenario in which the user of your product experienced some kind of pain or desired something better.
    • The basics: Who are the target users? What is the goal of the product? What is the broader pain point you are solving? Don’t assume everyone understands the problem you are trying to solve, as there are many teams and judges may not be experts in your area.
    • The walkthrough: How does it work? What are main features? What are the standout features? Why is it better than comparable products? Are there any cool, sophisticated or elegant components to your design or architecture?
    • The tagline: What do you want to leave them with? Why would they want to share or talk about your product to a colleague or the broader public? Don’t feel the need to go beyond your key message, as this will water it down.
  8. Get the most value out of the event by getting your priorities straight. Six months from how, what will you have gained from this event. Hackathons are about having fun, meeting new people, learning new things, and getting new ideas.
  9. Don’t over commit.  We want you to be able to accomplish what is reasonable in your dedicated time.

What will be more valuable when the event is over? New skills learned, new connections, or a fun new toy. The new toy will be fun for a while, but that skill or that connection may change your life.

Happy Hacking!

 

Agile and Biting Off What You Can Chew

We’ve been fine-tuning our agility at Info-Tech, and even before we formally adopted agile and Scrum as our methodology, we’d been moving towards an Agile approach for a couple of years.

One of the interesting things about Agile is the abstraction of the concept of “units of work,” and taking that abstract idea and putting it to something that the team holds itself accountable towards.

For example, the concept of a velocity is basically that a person can get a set amount of work done in a week. We level this out so that while some people can do more work in that amount of time, and others can do less in that same time, the average ends up being a number that we can all commit to. Over time, we’ve decided that the number is 20.

How we got to that number is interesting though, and required us to think a little bit differently. Getting people to understand abstracted time made me feel a little bit like Jennifer Aniston’s character in The Break Up; “I don’t want you to do the dishes…I want you to WANT to do the dishes!” We had to make people not think about time, but think about the concept of effort. Don’t think about how much time it took you to do that task, think about how much EFFORT it took you to do that task.

To figure out the smallest amount of effort, we took a mundane chore that has a measurable outcome. In this case, cutting a lawn. Before: unruly, out of control grass. After: golf green-like manicured lawn which we can boast about to our neighbours.

To make this effective though, we needed to abstract the concept of “cutting the grass” into some sort of measurable effort. While it takes me about twenty minutes to cut the grass at my house, it may take someone else on my team two hours to cut the grass at their house.

That might be because I have a smaller yard, better tools, or I’m just better at cutting grass than the other person, but the point is the task was the same – cut the grass – and the effort was likely the same. We got behind a lawnmower and pushed for a period of time.

With that in mind, we got a few other people’s opinions on how long it takes them to cut the grass, and we ended up with an average of about an hour.

Then we looked at some other tasks that are roughly the same “shape” as cutting the grass. Let’s say “vacuuming the house,” and “doing the dishes.”

Knowing those are all roughly somewhere between 20 minutes and two hours, we were able to establish a baseline for our lowest level of task. We call this a 1 on our point chart.
We then figured out how much effort we can get into a day, two days, and a week.

We call these 5, 10, and 20 points.

So we know that a five-point story is roughly one person working on it for one day. The math isn’t perfect; in theory you can fit eight one-point tasks in a day, but the reality is it’s closer to five.
With that wrapped up, we started using the point system for a little while, tweaking it as we needed to. One of the interesting things we found was that the larger the story was, the worse we were at estimating its points.

My group found that we were pretty good at estimating anything that was a one or a five, but much over a five, and we started to get a little less accurate.

What tends to happen is that we underestimate the things that are contained in those larger stories.
When you’re dealing with a five or below, it’s usually a discrete enough task with a few small dependencies. Something like “style a specific thing,” or “make a portion of a page work”.

Again, making this abstract, a 20 might be “paint the house,” there’s a lot of detail inside of “paint the house.” Does that mean all of the bathrooms, all of the bedrooms, all of the hallways? Just doing the master bedroom might be a day and a half. What about the basement? That could be a full two days on top of the bedrooms, bathrooms, kitchen, dining room, living room, and family room. What about the trim? Trim alone might be a 20-point story!

That week quickly becomes two or three weeks (and that doesn’t take into account all of the blockers like “work,” and “taking the kids to gymnastics”).

So we started breaking the tens down into fives and ones, and then we introduced the number three, which represented a half day’s work. It was amazing how quickly a ten would become five threes or four fives. The problem is that people are horrible at estimating time.

However, once we could break things down into discreet tasks and look inside those tasks to figure out accurate estimation, things got much better, and we ended up being able to accurately estimate how much work we could get done in a reasonable amount of time.

All because we decided to bite off only as much as we could chew in each mouthful.

How it works – IT Person of the Month

The IT Person of the Month (or ITPOTM) initiative is intended to let us recognize people in the department who go above and beyond by working outside their role, to help others, to better the company or department, or people who epitomize InfoTech’s core values and are inspirations for others. People who deserve recognition.
Each month, we have 5 nominees from within the department, broken down across function.

  • Infrastructure and Support
  • Application Development/Designers – 2 nominations
  • Managers

Nominations come from our IT staff – although we do take the odd nomination from outside of our group, the primary purpose of ITPOTM is to recognize and reward peer contributions. We’ve varied the process a bit from time to time, but basically you can nominate anyone, or even several anyones. But, you need a story. Why is this person getting your kind words and support? What have they done to deserve your precious recognition? It is this compelling story that will make a finalist into a winner. If for some reason you’d like your words to be anonymous, tell you manager, and they will strip your name off of your nomination before sharing it with the team.

Once the all of nominations are in, you need to do a bit more work: each group looks at all the nominations it receives for people in that group and selects a finalist (or 2 for AppDev/Design). Congrats! With a finalist selected, you’ve done your part and can rest.

All of the finalist names go to the management team and they select the IT Person Of The Month, who is announced with great fanfare at the department meeting. At that meeting, all of the stories for each nominee will be read out loud, preferably by the person who wrote the story, but they are published without names, and if no one speaks up to read the story (that is, you wish to remain anonymous, or are really shy), we will read it or find a volunteer.

Outside of the praise and adulation of the entire InfoTech IT department and 30 days of bragging rights, what’s in it for the nominees? Each finalist gets to go for a “CIO Roundtable” lunch with William, our VP of IT, and the ITPoTM winner receives a shiny new Something (each finalists selects their potential Something when they are selected as a finalist).

That’s how it works. Feel free to start nominating.