Monday, November 24, 2025

connection storm in TCP and how to avoid

All databases, in fact, all TCP servers, are susceptible to a connection storm.

Let's dig deeper to understand what it is and how to handle it.

A TCP connection storm occurs when applications rapidly open and close a large number of database connections, overwhelming the server's connection handling capacity.

The impact can be severe, leading to resource exhaustion (each connection takes up 2-8 MB), CPU thrashing during connection handshakes, internal lock contention (in the case of multi-threaded handling), high memory fragmentation, or even downtime.

If your database is distributed and the master faces a connection storm, it may put your database in an inconsistent state, and it can be really tricky to bring it back to consistency.

A few ways to handle this at the OS level include configuring iptables rate limiting, which limits new connections per IP.

You can also tune the TCP stack by updating the `/etc/sysctl` file and configuring SYN parameters and the connection queue.

All major databases also expose TCP-related parameters like backlog size, pool size, timeouts, etc., so review the configurations and tune your parameters accordingly.

You can also add a second line of defense with a database proxy (e.g., ProxySQL, PgBouncer, etc.).

A good practice is always to monitor the connection count on your database instance.

Depending on the database you use, there are ways to gather these metrics, so proactively monitor them and set up alerts.

Some examples are: - `pg_stat_activity` in Postgres - `threads_connected` metric in MySQL.

This will help you prevent outages due to a connection storm and respond in time.

 

No comments:

Post a Comment