04 May 2024

DBOps Automating infrastructure operations

(sketching out requirements for a database infrastructure automation tool. This resulted in writing a more generic prototype at https://github.com/zph/capstan which I prototyped on a local mongo cluster and achieved a proper upgrade procedure without human action beyond confirmations at critical steps)

DBOps

Consists of an event loop that observes a system for a desired state.

Phase 1

Desired transitions are registered in the system and consist of:

class Action {
	preChecks [() => {}]
	postChecks [() => {}]
	fn () => {}
	name () => {}
	procedure () => {}
}

Transitions can be composed into an Operation

class Operation {
	preChecks [(): boolean => {}]
	postChecks [(): boolean => {}]
	action [Action]
}

At this phase, humans register an operation to run and before each action’s fn is run, the operator is notified for confirmation to run the action.

This can be done as posing a prompt for the human in terminal, slack, etc such as:

@bot> Ready to run Action: ${action.procedure()}
@bot>> Confirm by responding: @bot run XXX-YYY-ZZZ

Phase 2

During this phase, the system determines the changes necessary by knowing the desired state and checking if the world matches the desired state.

class State {
	fetch: () => () // get the world's state
	check: // see if fetched state matches desired state
	plan: // recommend the changes needed, Action[]
	apply: // internal function to run the Action[]
	rollback: // undo apply
}

TODO: look into the saga pattern for inspiration on actions and rollbacks

https://github.com/SlavaPanevskiy/node-sagas/blob/master/src/step.ts https://github.com/SlavaPanevskiy/node-sagas?tab=readme-ov-file#example

https://github.com/rilder-almeida/sagas

https://github.com/temporalio/temporal-compensating-transactions/tree/main/typescript