Overview
This project covers one of the most demanding architectural challenges in AWS: running a production workload that is fully active in two or more AWS regions simultaneously. Both regions accept live traffic, and data is replicated bi-directionally with sub-second latency.
What You Will Learn
- Global traffic routing with Route 53 latency-based and health-check policies
- DynamoDB Global Tables for multi-region, multi-master data replication
- Aurora Global Database for relational workloads spanning regions
- Cross-region Application Load Balancer failover
- Conflict resolution strategies for distributed writes
- Observability across regions with CloudWatch cross-account dashboards
Prerequisites
- Solid experience with AWS core services (EC2, RDS, VPC, IAM)
- Understanding of distributed systems concepts (CAP theorem, eventual consistency)
- Familiarity with Infrastructure as Code (Terraform or AWS CDK)
- AWS CLI configured with sufficient permissions
Architecture
┌─────────────────────────┐
│ Route 53 │
│ Latency-based routing │
│ + Health checks │
└────────┬────────┬────────┘
│ │
┌─────────────────▼─┐ ┌──▼─────────────────┐
│ us-east-1 │ │ eu-west-1 │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ ALB │ │ │ │ ALB │ │
│ └──────┬──────┘ │ │ └──────┬──────┘ │
│ ┌──────▼──────┐ │ │ ┌──────▼──────┐ │
│ │ ECS/EKS │ │ │ │ ECS/EKS │ │
│ │ App Layer │ │ │ │ App Layer │ │
│ └──────┬──────┘ │ │ └──────┬──────┘ │
│ ┌──────▼──────┐ │ │ ┌──────▼──────┐ │
│ │ DynamoDB │◄─┼───┼─►│ DynamoDB │ │
│ │ Global │ │ │ │ Global │ │
│ │ Table │ │ │ │ Table │ │
└──┴─────────────┴──┘ └──┴─────────────┴───┘
Steps
1. Set Up the VPCs in Each Region
Use the same CIDR structure in both regions to keep things predictable (use non-overlapping ranges if you plan to peer them):
# Terraform example - repeat for eu-west-1
resource "aws_vpc" "main" {
provider = aws.us_east_1
cidr_block = "10.0.0.0/16"
tags = { Name = "main-us-east-1" }
}
resource "aws_subnet" "private" {
for_each = toset(["us-east-1a", "us-east-1b", "us-east-1c"])
provider = aws.us_east_1
vpc_id = aws_vpc.main.id
availability_zone = each.value
cidr_block = cidrsubnet("10.0.0.0/16", 8, index(["us-east-1a", "us-east-1b", "us-east-1c"], each.value))
}
2. Deploy DynamoDB Global Tables
- Create the table in
us-east-1:
aws dynamodb create-table \
--table-name Orders \
--attribute-definitions AttributeName=orderId,AttributeType=S \
--key-schema AttributeName=orderId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
- Add
eu-west-1as a replica:
aws dynamodb update-table \
--table-name Orders \
--replica-updates '[{"Create": {"RegionName": "eu-west-1"}}]' \
--region us-east-1
Writes in either region are asynchronously replicated — typically within 1 second.
Conflict resolution: DynamoDB uses last-writer-wins (LWW) based on wall-clock time. Design your application to avoid conflicting concurrent writes to the same item, or use a version attribute with conditional expressions.
3. Set Up Aurora Global Database
- Create the primary cluster in
us-east-1:
aws rds create-global-cluster \
--global-cluster-identifier my-global-db \
--engine aurora-mysql \
--engine-version 8.0.mysql_aurora.3.04.0 \
--deletion-protection
- Add a secondary region:
aws rds create-db-cluster \
--db-cluster-identifier my-db-eu-west-1 \
--engine aurora-mysql \
--global-cluster-identifier my-global-db \
--region eu-west-1
Aurora Global Database provides a single write endpoint (primary) with read replicas in secondary regions. Use it for workloads requiring strong consistency — route all writes to the primary and reads to the nearest region.
4. Deploy Application Layer in Both Regions
Use ECS Fargate or EKS with the same container image:
# ECS Task Definition (simplified)
containerDefinitions:
- name: api
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/api:latest
environment:
- name: DYNAMO_TABLE
value: Orders
- name: AWS_REGION
value: us-east-1 # Override per region at deploy time
portMappings:
- containerPort: 3000
5. Configure Route 53 for Global Traffic Routing
# Create health checks for each regional ALB
aws route53 create-health-check \
--caller-reference "us-east-1-$(date +%s)" \
--health-check-config '{
"Type": "HTTPS",
"FullyQualifiedDomainName": "alb.us-east-1.example.com",
"ResourcePath": "/health",
"RequestInterval": 10,
"FailureThreshold": 2
}'
# Create latency records pointing to each ALB
# (Repeat with SetIdentifier "eu-west-1" for the European endpoint)
aws route53 change-resource-record-sets \
--hosted-zone-id YOUR_ZONE_ID \
--change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "api.example.com",
"Type": "A",
"SetIdentifier": "us-east-1",
"Region": "us-east-1",
"HealthCheckId": "HEALTH_CHECK_ID",
"AliasTarget": {
"HostedZoneId": "ALB_HOSTED_ZONE_ID",
"DNSName": "alb.us-east-1.example.com",
"EvaluateTargetHealth": true
}
}
}]
}'
6. Observability and Failover Testing
Set up CloudWatch cross-region dashboards and alarms:
# Push a cross-region metric alarm to SNS
aws cloudwatch put-metric-alarm \
--alarm-name "us-east-1-error-rate" \
--metric-name 5XXError \
--namespace AWS/ApplicationELB \
--statistic Sum \
--period 60 \
--evaluation-periods 2 \
--threshold 50 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:ops-alert
To test failover: deliberately fail the health check endpoint in one region and verify Route 53 redirects traffic within the configured TTL.
Cost Considerations
Multi-region active-active is expensive. Key cost drivers:
- DynamoDB replicated write units — charged in each replica region
- Aurora Global Database — full cluster cost per region
- Data transfer — cross-region replication traffic
- Duplicate compute — full application fleet in each region
Use AWS Pricing Calculator to model costs before committing.
Clean Up
- Remove Route 53 records.
- Delete ECS services and clusters in both regions.
- Remove Aurora Global Database secondary, then primary.
- Delete DynamoDB Global Table replicas, then the table.
- Delete VPCs and associated resources.
Going Further
- Implement circuit breakers in your application layer for graceful degradation.
- Use AWS Global Accelerator instead of Route 53 for anycast IP routing.
- Explore Conflict-free Replicated Data Types (CRDTs) for complex conflict resolution.
- Add AWS X-Ray distributed tracing across regions.