Amazon RDS Back Ups over VPC for Internal Network Storage

We have our hosting and database storage on the Amazon cloud.  This is a very strong, functional platform.  However like all remote systems, it can sometime be painful to get local copies of the data.  When your database size in calculated in the GB realm, you have very few options other than wait when making backups.

Amazons automated backups is a nice touch ensuring you always have a nightly version of your data in a mission critical case, but it also allows you the ability to be able to build a temporary RDS to export your data from, thus removing any impact on your production environments. This solution is great, but it is time consuming.  You need to build your RDS from the snap shot, change a bunch of settings to make it available to your client, perform a long winded mySQLdump to get your data out, then copy it back down across the wire.  That is time you can spend more wisely on other higher priority tasks.

So this is what we were tasked with.  Automate a backup of the data nightly from Amazons RDS through the leveraging of their CLI tools, and get it back into the office network to allow much faster access for local development setups.

Not long ago, we went through an entire Amazon Infrastructure rebuild implementing both a public and private VPC for our network in the office, this was clutch to make the following solution work.  Without this bridge in place you would still have to go through the drawn out process of copying your data back locally, or run the rusk of exposing file shares to a secured internal network to the world (which I highly suggest you don’t).

WHAT THIS WILL DO FOR YOU:

This utility will perform the following for you with no interaction required from yourself once it is configured and running.

  1. Find the most recent automatic backup for your RDS DB Instance Group by queuing the list of automatic snapshots, sorting them created date, and taking the most recent one.
  2. Mount this backup to a new RDS Instance
  3. Set the Security Group for the RDS Instance making it available to your SQL EXPORT server
  4. Set the DB Parameter Group for this instance to allow you to build custom functions through your sanitization script
  5. Export your database using mysqldump, and compressing it with gzip
  6. Copy this backup gzip to a mapped folder for an internal server within your corporate network
  7. Sanitize the database with the supplied SQL script
  8. Export and compress the sanitized version
  9. Copy the sanitized version to a different share point across your corporate network
  10. Destroy the RDS that was created.

ASSUMPTIONS:

What I will walk you through here below is the contents and scripts to make your lives MUCH simpler with you backups.  I am going to be writing this as if you have a somewhat sound understanding of Amazons Infrastructure and VPC configurations.  I will walk you through the steps needed to configure a working nightly backup that will also allow you to clean and sanitize any data that will be available to your developers.  Source for this can be found here (https://github.com/jbrodie/aws-backup) for your reference / usage to get this setup running for your day to day usage.

A couple things I am going to assume before we start this:

  1. You have an Amazon account setup and know how to get into it.
  2. You have set up a public and a private VPC in the infrastructure of Amazon.  If you haven’t done this yet, Amazon has great documentation on how to go about setting this up and configuring it from start to finish.  You can find their guide here (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Introduction.html).
  3. You have your internal network setup within you private VPC.
  4. You have a database you want to export that has auto backups configured.
  5. You have created a “backup” user on your database that you wish to export with the following abilities:
    • show grants for ‘backup’@‘[host ip]’;
    • grant SELECT, UPDATE, DROP, LOCK TABLES, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, EXECUTE ON `[db_name]`.* TO ‘backup’@‘[host ip]’;

GETTING STARTED:

First off, lets get you setup and configured on an S3 instance for your exporting needs.  This doesn’t have to be a super large instance, although something with some power isn’t a bad move, I believe we went with a “t2.medium” and it seems to be able to handle the work load without too much of a problem.  Install the instance with an the standard Ubuntu 14.04 image available on Amazon.  This instance should exist in your PRIVATE VPC, without an PUBLIC IP address.  This box needs to be secure as it will be connecting to your internal network.

Next you will need to install a NAT TRANSLATION BOX into your public VPC and use this to allow the connection for updates from your PRIVATE SQL EXPORT box.  Amazon has a great walk through on this that can be located here (http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html).  Once you have this box set up and installed with the recommended image, you can now use this to set the access for your PRVATE SQL EXPORT box.

Moving back to your SQL EXPORT box, you will need to install the following packages by performing the following commands:

  1. sudo apt-get update
  2. sudo apt-get install mysql-client-5.5
  3. sudo apt-get install cifs-utils
  4. sudo apt-get install awscli

You should also go ahead and create a .my.cnf file located in your home directory for the ubuntu account on the system.  In this file you will need to include your backup users credentials to allow the mysql commands to fire from the command line without having to enter your login credentials, as well as preventing you from having to have them in the command and committed to Github, because we all know username and password in Github is a bad thing.  The file should contain the following:

[client]
user=”backup_user_account_name”
password=“[password]”

Once you have these packages installed, you will need to configure your Amazon access with your Key ID and Secret Access Key ID.  Again, there is a great walk through on this located here: (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html).

When you have your keys set up you will need to enter the following to configure your Amazon CLI access.
aws configure

This will prompt you for the following information.  Enter the information from the credentials on your previous step.
AWS Access Key ID [None]: XXXX123456789XXXX
AWS Secret Access Key [None]: XXXXABCDEFGHIJXXXX
Default region name [None]: [Your hosting region for your VPC]
Default output format [None]: table

Now you will have access through the Amazon CLI to you instances.  You will be able to run the scripts now against Amazon and have the control you need.

You should not proceed to mapping your drive into your internal server.  You again should create a local credentials file in your home directory to avoid having to store these in Github.

Create a file named “.credentials” in your home directory and enter:

username=[Your network user with access to your shared drive for read and write]
password=[password]
domain=[Your domain]

This will allow us to target this file when attempting to map the network drive.  So in your home directory, create a folder names “RDSBackups”.  Once you have this created run the following command to map the directory to the share on the server you are wanting to copy your backups to.

sudo mount //[Server IP]/RDSBackups$ /home/ubuntu/RDSBackups -t cifs -o credentials=/home/ubuntu/.credentials,uid=1000,gid=1000

This will map the folder share for this session.  Ensure you can actually copy, create and delete files from this share.  Once that is confirmed you can move on to adding this record to your stab to ensure that this drive path is mapped after each reboot.

sudo pico /etc/fstab//[Server IP]/RDSBackups$ /home/ubuntu/RDSBackups cifs credentials=/home/ubuntu/.credentials,uid=1000,gid=1000

Reboot your server now and confirm that this mapping is actually working and remapping on a reboot.  This will be necessary if you ever move this process to OpsWorks to be a time based server to start and stop automatically.

TIP: If you are planning on doing a sanitized version of your database, then you will need to create another drive mapping to a different folder that you can make accessible to the developers, restricting access to the primary folder that will have the confidential data within it.  I would suggest a mapping for RDABackupDevs$ as a share point, then the corresponding folder on your SQL EXPORT machine mapped through the same user in the ‘stab’ file.

We can now go ahead and clone the repo from Github and start our configuration of the scripts to allow for the backups to auto backup.

git clone git@github.com:jbrodie/aws-backup.git rds-exporter

In this folder you will see several different partial included files as well as the primary file, which for this case we will copy and use ‘database_name.sh’ as our working template to set up your first export.

Copy over the ‘database_name.sh’ file and name it to match the db instance name for clarity that it will be actioning on. Example, if your RDS DB Instance is named ‘website’, then use this name. Once you have the file copied, be sure to open it and change the instance name in the top of the file accordingly:

export SNAPSHOT_GROUP_NAME=’website_production’

Be sure to update the mappings to the save folders for where you want to copy the exported data to on your local machine (which is the mapped folder to your server on your VPC).

mkdir -p ~/RDSBackups/${CURRENT_DATE}
export SAVE_PATH=~/RDSBackups/${CURRENT_DATE}
mkdir -p ~/RDSBackupDevs/${CURRENT_DATE}
export SAVE_PATH_DEVS=~/RDSBackupDevs/${CURRENT_DATE}

If you wish to create a sanitized version of the data for your developers, you can also copy over the ‘database_name.sql’ file in the sanitizer folder to and name it accordingly to match your DB Instance name, ‘website.sql’.  Within this file you can write your custom SQL to clean and sanitize the data for your developers, allowing you to ensure that secure information isn’t left out in the wild.

You will need to go and update the information in the ‘_shared_config.sh’ file before going forward.  This information is what will be used for setting up and assigning the proper settings to your RDS instance to allow the export to work properly.  Most of these should be self explanatory, by default all of these ARE REQUIRED unless you edit the source scripts to remove the option that are not applicable to you.

export CURRENT_DATE=$(date +”%Y_%m_%d”)
export VPC_SECURITY_GROUP=“”
export AVAILABILITY_ZONE=””
export DB_SUBNET_GROUP_NAME=””
export BACKUP_INSTANCE_NAME=”auto-export-$SNAPSHOT_GROUP_NAME”
export DB_PARAMETER_GROUP=””
export LOG_FILE=”$(dirname $0)/logs/${CURRENT_DATE}-${SNAPSHOT_GROUP_NAME}.log”

IMPORTANT: One of the issues I came across when I was building this was the fact that Amazon RDS doesn’t really allow user accounts without SUPER rights to be able to create FUNCTIONS within their databases.  This is a security thing, and it makes sense.  That being said with the process of sanitizing the data from the database before exporting it we need to ensure that the sensitive information is removed.  You will need to go into your Amazon Account -> RDS -> Parameter Group, and clone the existing parameter group you are using in your production environment.  Once you have this clones, edit the ‘log_bin_trust_function_creators’ and set this value to 1.  This will allow your users of the RDS instance on this parameter group the ability to be able to create the function we will need to sanitize the data.

You can now test run your setup by running the following command:

/home/ubuntu/rds-exporter/[db_name].sh

What you will see is the output in the console, as well as the contents will be logged into the /logs folder instance and time stamping your log files.

Once you have this running with success, go ahead and check the time on your instance for when it is set to do your auto backups. Depending on the size of your database, you can then schedule this script to run at an appropriate time through the crontab.

This is still a very much work in progress, any changes that I make I will do my best to ensure the code base and this article are updated to reflect the most current version.

Please feel free to submit any suggestions / comments below or on the github repository.