Improving backup to Amazon S3

I use s3.exe command line utility to backup photos to Amazon Simple Storage (S3) and it works great.

I have lots of photos to backup: 900,000 files in 4,000 folders.

Obivously it is impractical to copy all files on every backup – it will take forever. s3.exe has exact feature that I need:

/sync is used with the put command and only uploads files that do not exist on S3 or have been modified since last being uploaded, based on the timestamp. It can be used alone or in conjunction with the /sub option for a fast incremental backup of a whole directory.

Here’s command line that I use for backups:

s3.exe put mybucket C:\photos\ /sub:withdelete /sync /acl:public-read /yes /nogui

Charges

When I started using S3 however I was surpised with the bill from Amazon.

Let’s open source code to see how backup is done. Here are relevant bits:

foreach (string file in Sub.GetFiles(directory, filename, sub))
...
DateTime? lastModified = svc.getLastModified(bucket, key);
if (lastModified.HasValue && lastModified.Value > File.GetLastWriteTimeUtc(file))
{
Progress.reportProgress(key, 0, 0);
continue;
}

For each file s3.exe gets last modified date from the storage. If it is greater than last modified date of local file then no upload is performed.

Let’s do some calculations. To get last modified date s3.exe sends HEAD request. Amazon charges $0.01 per 10,000 HEAD requests. So I would end up paying $0.09 every time I perform backup. If I do it every day that’s $27 per month.

Let’s try to optimize it. How about this: for every local folder get the corresponding list of files in storage. Last modified date will be included in response. Now we’re going to issue about 4,000 LIST requests (1 for each folder). 1,000 LIST requests is $0.01 so that would be $0.04 in total. Or $1.20 per month – that’s saving of $25.80 – a big win 🙂

Making the Patch

I have created a patch by following steps from Scott Hanselman’s blog post.

I have submitted my patch to Codeplex, let’s see if project owner decides that it’s good enough to be applied.

4 thoughts on “Improving backup to Amazon S3

  1. Awesome……but could you tell what is the inner story of how backup is done in amazon s3?

  2. @dips Not sure what you’re asking. Do you want to see how I implemented backup to Amazon S3?

  3. I appreciate the insight into how the s3.exe works and the link to the svn. However .09 a backup times 30 days is $2.70. That of course is just the list cost, and anyway to reduce that cost is appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *