Encrypting Sensitive Data in a Multi-Region AWS Solution using KMS
Encrypting Sensitive Data in a Multi-Region AWS Solution using KMS
Author: Stephen Gaughan
Keeping Data Safe
Failure of an AWS Region (or of a critical AWS service in a region) is a rare event, but if we want to minimise the risk of disruption to our services, we must architect our services to expect such a failure, using a multi-region architecture. In the unlikely event that a cloud region fails, our service can continue in another region with clients experiencing minimal interruption.
Additionally, we will often need to store personally identifiable information (PII) or other sensitive data which is subject to data security regulations and standards. Our customers have entrusted us with their data, and we can't ever allow their data to get into the wrong hands.
When storing PII (or similarly sensitive data), encryption of individual records or fields is usually required, so that no-one (not even employees) can see the data directly, even if they have access to the database. All access to such data must be authorised, controlled and audited. Therefore, any ability to read the PII directly from a database, backup, or filesystem must be prevented. It must not be possible to decrypt large numbers of records from an entire database en masse.
Hence, the application service that is deemed responsible for this data needs to encrypt the relevant fields before storing them at rest, and decrypt those fields only when they are needed.
AWS Encryption Tools and Services
The Encryption SDK uses the practice of envelope encryption. KMS helps us to safely store our master keys (CMKs), and the Encryption SDK encrypts our data by first asking KMS to generate a data key from one of our CMKs. That data key is used to encrypt a piece of data, and an encrypted copy of that data key is stored alongside the encrypted data.
When we need to decrypt that data, the Encryption SDK asks KMS to first decrypt the data key, and it then uses the decrypted data key to decrypt the actual data. The Encryption SDK makes this process pretty seamless.
So in summary:
- KMS allows us to manage and protect our CMKs
- AWS Encryption SDK uses KMS as its master key provider, to generate and decrypt data keys using a chosen CMK
- AWS Encryption SDK encrypts and decrypts data using our generated data keys
AWS KMS is a strictly regional service. It is not possible to move encryption keys between AWS regions, meaning that we can only decrypt a data key by using the KMS in the same region in which it was encrypted.
This means that careful design consideration is needed to make sure that when we replicate such encrypted data between regions, we can actually decrypt our data in each region.
The cleanest multi-region architecture typically has two or more live regions sharing the load. There's no standby infrastructure: all regions are live and receiving traffic. If our service fails in one region, the others will scale up to handle their increased share of the load and our service clients encounter minimal disruption.
To achieve this model, our data must be synchronised across our chosen AWS regions. Updates stored in one region are propagated almost instantly to the databases in the other regions, keeping everything in sync. AWS supports such inter-region replication for most managed data storage services, including DynamoDB and Aurora.
Here's a sample 2-region architecture, with no field-level encryption, using DynamoDB Global Tables to synchronise data between regions in near real-time:
Note: This simplistic diagram includes only the components/services relevant to this example
The web service is running in 2 regions and stores its data in DynamoDB, which is configured to sync automatically between regions. A request from a client can be routed to either region.
But what happens when we want to encrypt data using the Encryption SDK and KMS? We know that data encrypted by KMS in one region can't be decrypted by KMS in another region. This creates an undesirable dependency on one region. This has cost and latency implications (cross-region KMS calls) and, if that region fails, we won't be able to access our encrypted data until that failed region is restored.
Data Key Copies
Fortunately, the envelope encryption of the Encryption SDK allows the storage of multiple encrypted copies of the data key in the meta-data of your encrypted field. The solution is to ask KMS in each chosen region to encrypt a copy of the data key. When we need to decrypt the data in a given region, we can identify the relevant encrypted data key for that region and use that for the decryption process. This means that our data can be decrypted locally in each region using that region's KMS, no matter which region was used to originally encrypt it. And if one AWS region falls over, we can still decrypt our data in another region and our service continues with minimal disruption.
So, our example architecture now looks like this:
When our web service running in the AWS Ireland region needs to encrypt and store some data:
- The service in the Ireland region will use the Encryption SDK library (embedded in the service) to ask Ireland KMS to generate a new data key from one of our Ireland CMKs. KMS will issue a data key in 2 forms: encrypted and plain text (unencrypted).
- The Encryption SDK will encrypt the data using the data key.
- It will also call Virginia KMS to encrypt the data key using one of our Virginia CMKs. The Encryption SDK will then attach the two encrypted data keys to the data and return it to our service which then stores it in DynamoDB.
- DynamoDB replicates that data to Virginia.
When an instance of our service in our AWS Virginia region reads the encrypted data and needs to decrypt it:
- The Encryption SDK will first examine the set of encrypted data keys attached to the data, and can identify that one of them was encrypted using Virginia KMS, so that is the preferred data key to use for decryption.
- The Encryption SDK will call Virginia KMS to decrypt the data key, and we then can use that decrypted data key to decrypt the associated data.
Testing Resilience Against Region Failure
Having implemented our multi-region encryption solution using KMS, we will have introduced a cross-region dependency whenever we need to create a new data key, so we will need to ensure that our service still works well when KMS in another region is not responding.
Disaster resilience testing should involve not only taking our own services off-line in an AWS region, but also simulating failure of any cross-region service dependencies that we may have introduced.
If the KMS service for one of our regions can't be reached, our service must still be able to handle that gracefully, and encrypt new data using the active region(s), albeit now necessarily missing the encrypted data key for the broken region.
So, in our example, if our service in Virginia can’t reach Ireland KMS for any reason, it must still be able to encrypt and store the data using just its own KMS.
Of course, when the broken AWS region is restored, we will need a process to read and re-encrypt any data that was stored while the region was down, so that the data has the full complement of region-encrypted data keys and remains as resilient as possible.
Generating data keys, and then encrypting them in one or more additional regions has a cost. It is possible to re-use a data key.
Your data security policy may require you to use a unique key to encrypt each field. Or maybe your policy will permit you to re-use a key a certain number of times, or for a certain time period.
The Encryption SDK supports data key caching for this purpose. How many times you use a data key or how long you keep it in memory will be driven by your risk posture.
We hope that this has been useful. If you have any questions, please feel free to contact us at Protego