26 May 2026

The Perfect Proxy: Technical Specification

Part 2 of The Perfect Proxy

In complex, high-scale datastore deployments, the database proxy layer represents a critical and underutilized opportunity.

This document proposes the architecture for a next-generation database proxy designed to serve as a foundational component for infrastructure resilience and optimization.

The proposed architecture establishes layered defenses, achieving the following objectives: reliability, cost-efficiency, and developer velocity.

The goal is to build and deploy a database-agnostic interface that enforces load based rate limiting, traffic fairness and simplified developer usage.

Business Justification

The implementation of mproxy (eg multi-proxy) modernizes legacy databases along three axes: reliability, cost efficiency and developer velocity.

My experience running databases at scale, both multi-tenant and single-tenant, with variable workloads has shown me that fair server utilization remains an unsolved problem in open-source databases. For example, this manifests as low-priority traffic consuming resources from high-priority traffic, or as retry storms that use more connection slots than are available. Existing rate-limiting solutions are inadequate because they do not account for the load cost, at best they consider the query count using a token bucket algorithm. When moving beyond simplistic availability and into P99 or P999 latency as availability targets, these protections become increasingly important for system reliability.

On large enough systems, it becomes impossible to gate at the frontend and no open source database sufficiently implements these protections at the data store layer, so it is necessary to address as a defensive layer before traffic reaches the data store. If you have multi-tenant systems or you’re running at 99.95%+ availability, you start to need this capability.

By injecting these behaviors at a proxy layer, we see an opportunity to solve this across multiple database backends through a core proxy with an adapter per database wire protocol (mysql, postgresql, mongo, cassandra, redis, etc).

1. Reliability

The primary goal is to eliminate cascading failures and the “noisy neighbor” effect inherent in single and multi-tenant environments.

  • Failure Isolation: By utilizing Resource Usage based throttling and shuffle sharding, we reduce the “blast radius” of a single tenant’s runaway queries. This mitigates the risk of site-wide outages that typically result in significant revenue loss and SLA penalties.
  • Automated Circuit Breaking: The proxy acts as a safety valve, automatically draining request buckets during backend latency spikes. This prevents the “thundering herd” and ensures that transient issues do not escalate into permanent database degradation.
  • Performance: resource isolation per tenant and within a tenant’s queries enables pushing the boundary on p999 latency and total QPS.

2. Cost Efficiency

mproxy transforms how we manage and pay for infrastructure by providing granular visibility into every query’s true cost.

  • Granular Cost Attribution: Transitioning from simple RPS to Resource Units (RUs) allows for precise internal accounting. This data enables usage-based billing and allows the business to identify and manage tenant economics.
  • Increased Hardware Density: Sophisticated load balancing (P2C) and connection management allow for higher tenant density per cluster. This “bin-packing” approach can reduce overall cloud infrastructure spend without compromising p99 latency for premium tiers.

3. Developer Velocity

The proxy acts as a programmable middleware that abstracts away the “scary” parts of the database, letting engineers ship faster. It also enables independent development between the platform team and the product team, allowing each to operate in their optimal domain as long as the proxy’s contract is upheld.

  • Self-Service Guardrails: We eliminate the “fear of shipping” by implementing RU-based gating and Runaway Query Management at the proxy layer. Product teams can iterate on complex queries knowing the proxy will sandbox sub-optimal code before it melts the cluster, reducing the need for exhaustive manual reviews.
  • “Magic Comment” Control: Developers can drive infrastructure behavior directly via standard query comments. Whether it’s overriding a timeout for a batch job or forcing a read-after-write to a specific secondary, these Hints allow for rapid tuning without waiting on infra-level config changes.
  • Immediate Debugging Loops: Built-in Fingerprinting and OpenTelemetry provide an instant, per-query view of resource consumption. This shortens the feedback loop from “Why is the site slow?” to “This specific query is burning 10k RUs,” enabling targeted optimization instead of guesswork.

With all that said, when does it make sense to adopt such a tool that adds complexity and latency? My heuristic here is when any of the following are true:

  1. If you have multiple workloads inside the same cluster causing noisy neighbor problems
    1. Could be true multi-tenant or could be single tenant but it has disparate internal workloads ie user present traffic versus async.
  2. Reliability is needed at >= 99.95% SLA (availability or latency)

Core Specification

Beyond Static Rate Limiting: The Resource Unit (RU) Model

Most proxies rely on simple request-per-second (RPS) limits, which fail to account for the actual cost of a query. Inspired by prior art from TiDB, DynamoDB, Cosmos DB, and Airbnb’s adaptive traffic management, the “Perfect Proxy” brings Resource Units (RUs) to multiple datastores without invasive changes.

An RU is a calibrated metric accounting for datastore cluster load. Common direct and indirect resources considered are compute, network, and disk I/O.

AirBnB designed their reference algorithm for RUs as follows1:

  • RU_read = 1 + (weight_read x bytes_read) + (weight_latency x latency_ms)

  • RU_write = 6 + (weight_writes x bytes_written / 4096) + (weight_latency x latency_ms)

By making weight factors (weight_read, weight_writes, weight_latency) configurable per backend2 operators can penalize specific behaviors, such as significant write amplification, based on load-test calibration.

We see the wisdom in their choice of using indirect measurement for compute and disk through the usage of the latency component. While it is not as precise as a direct measurement of compute and disk, it is a practical choice that measures these factors as a function of time and removes the need to customize the database layer for exposing those metrics. Furthermore, TiDB’s RU algorithm3 measures Request Count, Payload Bytes, CPUSec (RU=αr+βr∗IOBytes+γr∗CpuSecs) and assert that CpuSecs can be replaced with a weighting factor on latency.

Because of the benefits of fairness among tenants or sub-tenants, the ability to distinguish and provide quality of service for high priority vs. low priority traffic, and to provide cost attribution to the business, we consider this the essential P0 capability of the project.

Advanced Traffic Shaping and QoS

To ensure fairness, RUs must be paired with Quality of Service (QoS) mechanisms:

  • [P1] CoDel (Controlled Delay): Monitor cluster degradation and gradually shed traffic starting with LOW priority queues during degradation4. This can be further enhanced by incorporating shuffle sharding of internal queues and of backend load balancing to isolate tenant impact.5

  • [P1] Runaway Query Management: Queries are fingerprinted and gated if they exceed defined RU counts or millisecond thresholds. These can be dropped, backpressured, or relegated to lower priority levels.6

  • [P4] Magic Comments & Hints: Support for “magic comments” allows developers to drive behavior by overriding timeouts, resource groups, or priority levels on a per-query basis. 7

Auxiliary Specification

Distributed Resource Coordination

A major challenge is distributing a single resource group’s RUs across a global proxy cluster without introducing a new central point of failure. We propose the following:

  • [P0] Gossip membership protocol: Proxies use a gossip based peering (SWIM) to avoid reliance on a central failure risk (e.g. centralized Redis cluster). Peer messages are asynchronous to the hotpath of traffic through the proxy. 8 In network splits the members fall back to safe per node behavior.

  • [P0] Fail-safe Propagation: Using a modified SWIM or gossip protocol, the proxy ensures that if the network partitions, it falls back to a stable, sensible default rather than failing closed.

  • [P3] The Bidding/Lending System: Proxies use a cooperative model where each node is guaranteed a proportional amount of the total limits (RUs, connection counts, etc) but can bid for spare capacity from other nodes via distributed leases. As free capacity becomes limited on lenders, the cost to borrow goes up until the system finds a stochastic balance.

    • Note: this needs more research and simulation.
    • Bidding and Auction Based Allocation and Scheduler Fairness Algorithms will inform design.

Performance

The proxy optimizes backend efficiency through several transparent mechanisms:

  • [P0] Intelligent Load Balancing: Beyond simple Round Robin, we use the Power of Two Choices (P2C) algorithm, selecting the best of two random backends based on least connections, RU usage, or observed latency. 9 10

  • [P1] Connection Management: Maintaining “hot” TCP or protocol-layer connection pools to eliminate cold start microdelays. We must also consider the cost of each ongoing connection to prevent connection swarms.

    • Especially valuable as connection pooler for Postgres, but remains valuable for MySQL/TiDB and Mongo.
  • [P3] Caching & Coalescing: Deploying a small, local opt-in Sieve cache and using hot-key detection (the “Britney Spears problem”) to collapse identical requests into a single backend query. 11

  • [P4] Adaptive query routing: Awareness of replication lag (e.g., via GTID tracking in MySQL or causal consistency in MongoDB) allows for safe routing of opt-in requests to secondaries. 12 13

  • [P4] Query Rewriting: Implementing transparent rewriting to switch resource groups, set maximum RU values, or redirect traffic between primary and secondary backends. The complexity here resides in avoiding this during most operations but enabling it for special opt-in cases and ensuring that configuration changes can be run in shadow mode before enabling the change. 14

Resilience Engineering

To prevent the “thundering herd,” the proxy acts as a sophisticated circuit breaker:

  • [P0] Retry Storms: Retries are treated as normal queries and are controlled via RU allowance. By utilizing latency as a key RU factor, we provide backpressure to callers during periods of overload when query responses slow down or fail by more rapidly exhausting the RU allowance. An open question is how to account for slow failed queries that never respond and we propose incremental tracking of latency rather than waiting for queries to respond.15

    • We can further control this via the coalescing functionality as a later improvement.
  • [P1] Observability: A deep OpenTelemetry exporter provides granular insights into RU consumption, fingerprinting, and shed traffic.

Challenges

I’ve outlined a large feature set and this is a project at a prototype level. The real work happens in the de-risking, hardening, and validation of a new proxy layer.

  • Complexity in the critical path: With such a project, we must minimize the complexity on the critical path and ensure safe-by-design behaviors that work during control plane disruption or networking incidents.

  • Latency: There’s a baseline level of latency that is unavoidable with this design due to an additional network hop. We must invest in making sure that all of the capabilities are opt-in by design, such that operators can choose where additional latency is worth the proxy capability. We acknowledge that extreme low latency circumstances may not tolerate the additional millisecond delays, but our industry experience indicates that a vast majority of workloads would have better tail latency at the cost of additional baseline latency alongside these additional capabilities.

  • Yet another infra component: to deploy, manage, upgrade and derisk for reliability issues. The payoff must be much greater than the pain here.

  • Incremental adoption: The proxy capabilities must start minimal and be opt-in, with safe mechanisms for shadowing new feature usage.

  • Simulation testing: Ideally to the standard of best in-class like FoundationDB.

  • Industry feedback: As an open source project, this will benefit from my colleagues and peers and other organizations to understand their key challenges and how a multi proxy could solve their problems.

Conclusion: The multi-proxy hero we need

The core logic of this proxy:

  • Resource usage rate limiting
  • Advanced Load Balancing
  • Traffic shaping and circuit breaking
  • Query coalescing
  • Bidding/lending system

is fundamentally independent of the database technology.

By designing an abstract interface, we can extend these protections to legacy SQL and NoSQL stacks, providing a high-availability proxy with dynamic configuration reloads. This approach leans on best-in-class industry experience at scale and enables us to fill critical capability gaps in modern infrastructure while continuing to rely on our trusted datastores.

I’ve christened it: mproxy, a multi-proxy for the datastore layer! Hat tip on the name to prior art in the fintech sector 😂. We build on the shoulders of giants!

I’ve built a prototype that works for mongo for many of these capabilities, now looking to discard that and rebuild the generic implementation with robust simulation testing, clean interfaces, and multi-backend support.

We can save the tech world from a myriad of outages, pages, and excessive cluster costs!

PS - The core concepts could similarly apply to a RU based rate limiter for GRPC (interceptor), REST API, etc.

Thanks to:

  1. Early draft reviewers
    1. Mokhtar Bacha @ Formal.ai
    2. Daniela Miao @ Momento
    3. Rachel Fenn @ Stealth
    4. Joy Zheng @ Plaid
    5. Mingjian Liu @ Airbnb
    6. Mike Rowland @ Plaid
  2. All the prior authors and operators whose research is represented and cited here, especially
    1. Marc Brooker
    2. AirBnB Engineering
    3. Uber Engineering
    4. Plaid for my own operator experience at scale
  3. AI for being my sounding board, research assistant and editor. Nothing gets a draft written like arguing with a PRNG.