28 Nov 2021

Mongo Deployment Experiment Using ASG, Consul and Terraform

Introduction

I’m exploring Terraform to deploy MongoDB sharded clusters. Aka, binge coding Terraform, Packer and Consul was a delightfully obsessive way to spend the recent holidays. I’m learning a lot of Terraform, Consul and Packer along the way :). Terraform’s impressive and I’m enjoying architecting a robust system from AWS’s building blocks.

Design

The principle this system follows is self-healing and immutability.

Components:

Packer - builds server images of the core functionality (mongod shardsvr, mongod configsvr, mongos) on top of a base image of Ubuntu LTS with consul agents pre-configured.

Terraform - deploys, updates and deletes AWS infrastructure:
- SSH keys
- Security groups
- VPCs and Subnets
- Auto-scale groups + Launch configurations
- EBS Volumes (boot disks and db data storage)
Consul Servers - these are the 3-5 servers which form the stable ring of consul server elements.
Consul - each mongo server has consul setup with auto-join functionality (aka retry_join: [provider=aws...") based on aws tagging.
- Used for dynamic DNS discovery in conjunction with systemd-resolved as a DNS proxy.
- Will use consul-template to update config files on servers post launch. (I’m hoping this is an elegant solution for ASG booted instances that need final configuration after launch and a way to avoid having to roll the cluster for new configurations.)
Auto-scale groups (ASG)
- Each mongo instance is an auto-scale group of 1.
- ASG monitors and replaces instances that become unhealthy
Auto re-attach of EBS data volume
- In the event that a mongo instance becomes unhealthy, ASG replaces the node but it will initially lack the db data bearing EBS volume.
- That EBS volume is prohibitive to recreate on large volumes when considering the restoration time + needing to dd the full drive to achieve normal performance.
- Instead of a new volume, a cronjob runs on each data configsvr and shardsvr that each minute tries to re-attach the EBS db data volume paired to this instance using metadata from the EC2 instance tags and the EBS volume tags.
- The cronjob looks up the required metadata and executes aws-volume-attach.
- If the volume is currently attached, aws-volume-attach is a no-op.
EBS Volumes
- DB Data volumes are separately deployed and persist after separation from their instance.
- These will be in the terrabyte size range.
- To replace a drive (corruption/performance issues)
  - Provision an additional drive from snapshot using terraform
  - Update the metadata of that shard’s replicaset member to point to the new drive’s name
  - terraform apply

My next steps are to automate the replicaset bonding and then shard joining. The open source tooling for this portion isn’t what I want, with the closest being mongo ansible. It’s an established tool but I want something more declarative and a simpler model of what it will do when executed. As a result, the answer might be a custom terraform provider to manage the internal configuration state of MongoDB. Philosophically the CRUD resource management and plan/deploy phase of Terraform matches what will give me confidence using this on production clusters.

I’ll open source the work if it gets to a mature spot. Right now the terraforming successfully spins up all the mongod nodes, networking, VPCs, security groups, ec2 instances, ebs volumes and they auto-join their consul cluster.

Credit for the concept of this approach belongs to multiple different blog posts, but the original idea of ASG + EBS re-attaching came from reading about how Expedia operates their sharded clusters. Thanks!

xargs.io

All the IO and Multiplexing

28 Nov 2021

Introduction

Design