Introduction
I’m exploring Terraform to deploy MongoDB sharded clusters. Aka, binge coding Terraform, Packer and Consul was a delightfully obsessive way to spend the recent holidays. I’m learning a lot of Terraform, Consul and Packer along the way :). Terraform’s impressive and I’m enjoying architecting a robust system from AWS’s building blocks.
Design
The principle this system follows is self-healing and immutability.
Components:
- Packer - builds server images of the core functionality (mongod shardsvr, mongod configsvr, mongos) on top of a base image of Ubuntu LTS with consul agents pre-configured.
- Terraform - deploys, updates and deletes AWS infrastructure:
- SSH keys
- Security groups
- VPCs and Subnets
- Auto-scale groups + Launch configurations
- EBS Volumes (boot disks and db data storage)
- Consul Servers - these are the 3-5 servers which form the stable ring of consul server elements.
- Consul - each mongo server has consul setup with auto-join functionality (aka retry_join:
[provider=aws..."
) based on aws tagging.
- Used for dynamic DNS discovery in conjunction with systemd-resolved as a DNS proxy.
- Will use consul-template to update config files on servers post launch. (I’m hoping this is an elegant solution for ASG booted instances that need final configuration after launch and a way to avoid having to roll the cluster for new configurations.)
- Auto-scale groups (ASG)
- Each mongo instance is an auto-scale group of 1.
- ASG monitors and replaces instances that become unhealthy
- Auto re-attach of EBS data volume
- In the event that a mongo instance becomes unhealthy, ASG replaces the node but it will initially lack the db data bearing EBS volume.
- That EBS volume is prohibitive to recreate on large volumes when considering the restoration time + needing to
dd
the full drive to achieve normal performance.
- Instead of a new volume, a cronjob runs on each data configsvr and shardsvr that each minute tries to re-attach the EBS db data volume paired to this instance using metadata from the EC2 instance tags and the EBS volume tags.
- The cronjob looks up the required metadata and executes aws-volume-attach.
- If the volume is currently attached, aws-volume-attach is a no-op.
- EBS Volumes
- DB Data volumes are separately deployed and persist after separation from their instance.
- These will be in the terrabyte size range.
- To replace a drive (corruption/performance issues)
- Provision an additional drive from snapshot using terraform
- Update the metadata of that shard’s replicaset member to point to the new drive’s name
terraform apply
My next steps are to automate the replicaset bonding and then shard joining. The open source tooling for this portion isn’t what I want, with the closest being mongo ansible. It’s an established tool but I want something more declarative and a simpler model of what it will do when executed
. As a result, the answer might be a custom terraform provider to manage the internal configuration state of MongoDB. Philosophically the CRUD
resource management and plan/deploy phase of Terraform matches what will give me confidence using this on production clusters.
I’ll open source the work if it gets to a mature spot. Right now the terraforming successfully spins up all the mongod nodes, networking, VPCs, security groups, ec2 instances, ebs volumes and they auto-join their consul cluster.
Credit for the concept of this approach belongs to multiple different blog posts, but the original idea of ASG + EBS re-attaching came from reading about how Expedia operates their sharded clusters. Thanks!