I have spend the last week trying to find a very annoying bug that caused Snort_inline to go into 100% CPU on certain traffic. It kept working, only my P3 500Mhz home gateway slowed down to between 2kb/s and 25kb/s, while normally it handles the full 325kb/s for my DSL line at around 25% CPU.
Snort comes with a number of performance measurement options. In 2.6 –enable-perfprofiling was introduced. Also, –enable-profile builds Snort for use with gprof. Next to those you can use strace and ltrace with the -c option to see the ammount of time spend in the several functions.
I already knew the problem was related to my new Stream4 code, since running Snort_inline without the ‘stream4inline’ option made the problem go away. So my performance debugging and code reviews were focussed on that code. However, the performance statistics showed no functions that took large ammounts of time in Stream4.
The Stream4 modificatons I have written for Snort_inline are aimed at being able to drop on alerts that were fired on the reassembled stream. In Snort_inline 2.4 this was just done by calling the reassembly functions for every (data) packet received and scanning a reassembled stream every time. This worked quite well, except with regard to performance, since Snort_inline needed create the reassembled stream for every packet. With many small packets in a stream this killed performance.
With the new code I have taken a different approach. For every stream there is a cache that keeps the reassembled stream stored and new packets are just added to it. This way the actions that have to be taken for each packet are significantly reduced. This comes at a cost of memory of course, but my performance testing shows that the approach works very well.
Back to the bug. I noticed from the performance reports, that in stream4inline mode much more time was spend in the detection functions. This was not really a surprise. Since we scan data in a sliding window of packets, we end up scanning the same data more than once. So a (small) performance penalty is expected.
However the performance hit was big and thats how I finally found the bug. It turned out that one little check was missing from my Stream4 code. If a packet without a payload was received, for example just an ACK packet, my code still setup the reassembled stream in such a way that it would be scanned. This makes no sense, since the reassembled stream wasn’t changed since the last scan because the packet contained no data.
So in the end, Snort_inline was scanning the same data over and over again, without any changes to it. In a large file transfer, data flows one way, empty ACK packets the other. Due to this bug, the empty ACK packets were a great performance hit.
More good news, it turns out that this bug was also responseble for the http_inspect issue I was having, where some unified alerts missed a payload. If there is a mismatch between what it scanned (the cache) and the actual payload of the packets (or lack thereof), then it makes sense the alert code can’t get the right payload to go with the alert.
So, with this out of the way, I think we can finally release a public Beta next week or so!