I was having trouble understanding the iptables hashlimit module and couldn’t dig up anything that really helped. The man pages are definitely lacking a clear explanation and /proc/net/ipt_hashlimit/ leaves out some information that would clarify things immensely. After some testing I managed to work it all out, so let’s go through it and see if I can help make sense of it for you too.
I’ll try not to assume too much prior knowledge about the module. We’ll be coming at this with the goal of blocking traffic that exceeds a certain amount of packets per second. From the man page:
hashlimit uses hash buckets to express a rate limiting match (like the limit match) for a group of connections using a single iptables rule. Grouping can be done per-hostgroup (source and/or destination address) and/or per-port. It gives you the ability to express “N packets per time quantum per group” or “N bytes per seconds”
Working at a service provider for hundreds of VoIP customers with thousands of phones means you’re going to encounter problems that make you scratch your head. Every now and then one of those problems gets escalated to me and I’m left figuring out why something is happening. Sometimes that means running packet captures on all sorts of equipment, installing in-line network taps to grab traffic off the wire, or trudging through debug logs. Every now and then I get lucky, figure out the problem before we get started and look like a wizard. More often though, I’m left just as confused as everyone else.
This problem has come to me in a number of iterations, but it’s always intermittent, unexplained phone reboots that appear to have no rhyme or reason. Mid-call or sitting untouched, whatever the usage scenario, the customer was complaining that their phones were rebooting, affecting multiple Aastra models. The last time this was happening with noticeable frequency was several months back, and we thought we nabbed the problem when a provisioning error related to an access control list (ACL) on a switch was fixed. Phones stopped rebooting, the ticket was closed and the customer was happy.
Fast forward to the other day when the problem reared its ugly head again after multiple phones in their office rebooted unexpectedly. The ACL was still in place, so this was something new. What could it be this time?