“Amazon DynamoDB is a fully managed NoSQL database service […]”. That’s what Amazon states in the DynamoDB FAQs. However, their definition is not quite what I would understand as fully managed. It’s missing counter-measures against the human error. For that reason, we need a DynamoDB backup and recovery solution, our DynamoDB safety-net…

DynamoDB Threat Scenario

Imagine your entire AWS infrastructure, including your DynamoDB tables, is defined and set up via AWS Cloudformation (btw, make sure to read why I hate Cloudformation) or even worse by hand. Further imagine you actually use your tables to store more or less important data. And finally imagine someone deleting your tables and/ or stacks by accident or your software behaving somehow unexpected, dropping tables. See?! DynamoDB does not ship with built-in backup and recovery implementations, so we had to set up our own protective strategy.

The solution we came up with basically orchestrates AWS DynamoDB with Datapipelines, S3 and Amazons Lambdas to establish not just a daily backup mechanism but also to allow simple staging (production to testing environment) and recovery possibilities. This is how it works:

DynamoDB w\ AWS Datapipeline & Lambda

In addition to the table definitions, our Cloudformation-based infrastructure setup also describes Datapipelines for the tables in our production environment which nightly backup DynamoDB tables to S3. Furthermore, these Datapipelines’ ObjectCreated-Events trigger a small AWS Lambda function we also have in place, which is responsible for deletion and recreation of the corresponding tables in our testing environment. Separated in time, additional Datapipelines refill these empty tables with the productive backups created earlier.

In that way, we established a battle-proven backup and staging solution for AWS DynamoDB. But is this of any help when we have to deal with a real DynamoDB disaster, e.g. accidential table deletion or data corruption? Yes, it is:

Due to the infrastructure as code approach, we can easily recreate entire stacks of DynamoDB tables and Datapipelines (yes, we run tables and pipelines in separated stacks) and use our latest backups to recreate a stable state. Everything with just two or three commands.

However, we don’t have any automation in disaster recognition yet as in my eyes it’s quite hard to identify possible disasters and (especially) distinguish those from intentional table changes. Also, with this setup we don’t have any chance to properly react on DynamoDB (region, AZ, etc.) not reachable/ available issues.

Further downsides: Our approach is heavily dependent on Datapipelines (I already have a ‘Why I hate AWS Datapipelines’ post in my pipeline ;D) and due to the orchestration of four Amazon Web Services it is not as simple nor cost-efficient as I would want a proper solution for this common problem to be.

Hope you got some inspiration for your future DynamoDB setup. If you have any questions or improvement suggestions make sure to leave a comment or contact me directly.

Jan Brennenstuhl

Jan is a software engineer, web security enthusiast & clean code artist. He lives in Berlin, is currently working for Zalando SE and loves his bonsai trees, coffee & road cycling.

jbspeakr jbspeakr

AWS CloudFormation (CFN) is a file-based interface for provisioning AWS resources and in that way is Amazons answer on the ‘Infrastructure as Code’ paradigm. I am a huge fan of this approach and CFN is a big step into the right direction as it supports not just common DevOps ideas,...