28 Nov 2021

Mongo Deployment Experiment Using ASG, Consul and Terraform

Introduction

I’m exploring Terraform to deploy MongoDB sharded clusters. Aka, binge coding Terraform, Packer and Consul was a delightfully obsessive way to spend the recent holidays. I’m learning a lot of Terraform, Consul and Packer along the way :). Terraform’s impressive and I’m enjoying architecting a robust system from AWS’s building blocks.

Design

The principle this system follows is self-healing and immutability.

Components:

  1. Packer - builds server images of the core functionality (mongod shardsvr, mongod configsvr, mongos) on top of a base image of Ubuntu LTS with consul agents pre-configured.
  • Terraform - deploys, updates and deletes AWS infrastructure:
    • SSH keys
    • Security groups
    • VPCs and Subnets
    • Auto-scale groups + Launch configurations
    • EBS Volumes (boot disks and db data storage)
  • Consul Servers - these are the 3-5 servers which form the stable ring of consul server elements.
  • Consul - each mongo server has consul setup with auto-join functionality (aka retry_join: [provider=aws...") based on aws tagging.
    • Used for dynamic DNS discovery in conjunction with systemd-resolved as a DNS proxy.
    • Will use consul-template to update config files on servers post launch. (I’m hoping this is an elegant solution for ASG booted instances that need final configuration after launch and a way to avoid having to roll the cluster for new configurations.)
  • Auto-scale groups (ASG)
    • Each mongo instance is an auto-scale group of 1.
    • ASG monitors and replaces instances that become unhealthy
  • Auto re-attach of EBS data volume
    • In the event that a mongo instance becomes unhealthy, ASG replaces the node but it will initially lack the db data bearing EBS volume.
    • That EBS volume is prohibitive to recreate on large volumes when considering the restoration time + needing to dd the full drive to achieve normal performance.
    • Instead of a new volume, a cronjob runs on each data configsvr and shardsvr that each minute tries to re-attach the EBS db data volume paired to this instance using metadata from the EC2 instance tags and the EBS volume tags.
    • The cronjob looks up the required metadata and executes aws-volume-attach.
    • If the volume is currently attached, aws-volume-attach is a no-op.
  • EBS Volumes
    • DB Data volumes are separately deployed and persist after separation from their instance.
    • These will be in the terrabyte size range.
    • To replace a drive (corruption/performance issues)
      • Provision an additional drive from snapshot using terraform
      • Update the metadata of that shard’s replicaset member to point to the new drive’s name
      • terraform apply

My next steps are to automate the replicaset bonding and then shard joining. The open source tooling for this portion isn’t what I want, with the closest being mongo ansible. It’s an established tool but I want something more declarative and a simpler model of what it will do when executed. As a result, the answer might be a custom terraform provider to manage the internal configuration state of MongoDB. Philosophically the CRUD resource management and plan/deploy phase of Terraform matches what will give me confidence using this on production clusters.

I’ll open source the work if it gets to a mature spot. Right now the terraforming successfully spins up all the mongod nodes, networking, VPCs, security groups, ec2 instances, ebs volumes and they auto-join their consul cluster.

Credit for the concept of this approach belongs to multiple different blog posts, but the original idea of ASG + EBS re-attaching came from reading about how Expedia operates their sharded clusters. Thanks!