Part 1 of The Perfect Proxy
Good fences make good neighbors but MAX_EXECUTION_TIME makes reduces noisy neighbors - DBA
At work, I run systems that need >= 99.995% of reliability, and I think a lot about safeguards and degrading gracefully. I’ve been thinking about the perfect proxy that is database agnostic via protocol specific adapters and will expand on that in Part 2. This post is making the case that load based rate limiting in a generic proxy is a step function improvement in our industry from current open source offerings.
The most critical capability of a perfect proxy layer is the ability to apply load-based rate limiting. It’s vastly superior to query count based rate limiting because it differentiates expensive queries from cheap queries and protects the cluster against overload.
Failing Systems
In an ideal world, we would be able to auto scale up to the point that the business is no longer willing to pay for that level of throughput or reliability, but in practical terms, some systems are slow or prohibitive to auto scale.
In the legacy database world, the core constraints tend to be around:
- connection counts
- user partitioning
- maximum query time
- in more advanced systems, control over data separation onto different shards
These are woefully insufficient in a modern ecosystem because they do not provide significant control over consumption of resources.
In reliability sensitive circumstances you need stronger guarantees when running a variety of workloads from the same or multiple services.
Optimal controls are provided by best-in-class internal tools at large tech companies but are out of reach of the normal enterprise. These are often custom systems like X-Stor1 or Abase2 designed for hard multi-tenant usage which control per tenant load in the cluster with fairness algorithm. Similarly, AirBnB and Uber have published work on their in-house proxy solution for resource usage based rate limiting and quality of service enforcement 3 4.
Instead of best in class tools being in-house, they should be available to normal enterprise organizations and we put forth a design for bringing that advantage to many orgs running a variety of datastores.
Business Justification
We propose a unified design of mproxy (eg multi-proxy) that modernizes legacy databases for reliability, cost efficiency and developer velocity.
Designing a proxy separate from the database engine, we achieve three core business objectives:
1. Reliability
The primary goal is to eliminate cascading failures and the “noisy neighbor” effect inherent in clusters with a variety of workloads which is especially valuable in multi-tenant but still valuable in single tenant scenarios (critical vs less critical traffic).
- Failure Isolation: By utilizing Resource Usage (RU) based throttling, we reduce the “blast radius” of a single tenant’s runaway queries. This mitigates the risk of site-wide outages that typically result in significant revenue loss and SLA penalties.
- Automated Circuit Breaking: The proxy acts as a safety valve, automatically draining request buckets during backend latency spikes. This prevents the “thundering herd” and ensures that transient issues do not escalate into permanent database degradation.
- Performance: resource isolation per tenant and within a tenant’s queries enables pushing the boundary on p999 latency and total QPS.
2. Cost Efficiency
mproxy transforms how we manage and pay for infrastructure by providing granular visibility into every query’s relative cost.
- Granular Cost Attribution: Transitioning from simple RPS to Resource Units (RUs) allows for precise internal accounting. This data enables usage-based billing and allows the business to identify and manage tenant economics.
- Increased Hardware Density: Sophisticated load balancing (P2C) and connection management allow for higher tenant density per cluster. This “bin-packing” approach can reduce overall cloud infrastructure spend without compromising p99 latency for premium tiers.
3. Developer Velocity
The proxy acts as a programmable shim that abstracts away the “scary” parts of the database, letting engineers ship faster.
- Self-Service Guardrails: We eliminate the “fear of shipping” by implementing RU-based rate limits and Runaway Query Management at the proxy layer. Product teams can iterate on complex queries knowing the proxy will sandbox sub-optimal code before it melts the cluster, reducing the need for exhaustive manual reviews.
- Immediate Debugging Loops: Built-in Fingerprinting and OpenTelemetry provide an instant, per-query view of resource consumption. This shortens the feedback loop from “Why is the site slow?” to “This specific query is burning 10k RUs,” enabling targeted optimization instead of guesswork.
With all that said, when does it make sense to adopt such a tool that adds complexity and latency? My heuristic here is when any of the following are true:
- If you have multiple workloads inside the same cluster causing noisy neighbor problems
- Most valuable on massive multi-tenant clusters but remains valuable when a single tenant has has disparate internal workloads like critical and non-critical traffic.
- Reliability is needed at >= 99.9% SLA (availability or latency)
Conclusion: The multi-proxy hero we need
The core logic of this proxy:
- Resource usage tracking and rate limiting
- Advanced Load Balancing
- Traffic shaping and circuit breaking
is fundamentally independent of the database technology.
By designing an abstract interface, we can extend these protections to legacy SQL and NoSQL stacks, providing a high-availability proxy with dynamic configuration reloads. This approach leans on best-in-class industry experience at scale and enables us to fill critical capability gaps in modern infrastructure while continuing to rely on our trusted datastores.
I’ve christened it: mproxy, a multi-proxy for the datastore layer! Hat tip on the name to prior art in the fintech sector 😂. We build on the shoulders of giants!
I’ve built a prototype that works for mongo for many of these capabilities, now it’s time to discard that design and rebuild the generic implementation with robust simulation testing, clean backend adapter interfaces, and high performance.
Stay tuned!
Part 2 will dive into practical details of the implementation.The Perfect Proxy: The Power of Limits (Part 1)