Snort_inline and TCP Segmentation Offloading

Since a short while I have a gigabit setup at home. My laptop has a e1000 Intel NIC, my desktop a Broadcom NIC.While playing with Snort_inline and netpipe-tcp, I noticed something odd. I got tcp packets that had the ‘Don’t Fragment’ option set, but were still bigger than the mtu size of the link. Snort_inline read packets of up to 26kb from the queue, and wireshark and tcpdump were seeing the packets as well. This was only for outgoing packets on the e1000 NIC. The receiving pc saw the packets split up in multiple packets that were honoring the mtu size. This got me thinking that some form of offloading must be taking place and indeed this was the case:

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on

It can be disabled by using the following command:

# ethtool -K eth0 tso off

The large packets caused problems in my stream4inline modifications of Stream4. The code can’t handle the case where the packet is bigger than the sliding window size. So I have some work to do 😉

Snort_inline patch updated to

With the recent Snort vulnerabilities we had to make a choice if we would backport the fixes to our Snort_inline patch or that we would upgrade to Upgrading makes most sense since SourceFire improves Snort with every release, but since the upgrade process has been very painful the last couple of releases, we weren’t really looking forward to it.

Earlier I wrote about my testing with Subversion for Snort_inline, and I found out that using Subversion made the upgrade procedure much easier and much less time consuming. So upgrading it was. Generally there were little changes to the Snort_inline patch required.

One thing however, messed up the way the new stream4inline code works. A new option in Snort’s Stream4, which is enabled by default, is session dropping. The way it works is that when a packet is dropped, the session it belongs to is instructed to drop every packet from that session from that time on.

This makes sense in many cases, but not in all. In stream4inline, we have created options to drop out-of-order packets, out-of-window packets, bad reset packets, and more. Generally, in these cases we want just drop those individual packets, not kill off the session.

Especially killing the session on bad reset packets would be making it easier to kill sessions by third parties. One might argue that sessions writing outside of the window can be killed, but when looking at the out-of-order limits, this can’t be done.

The out-of-order limits are enforced not because it is bad traffic, but to prevent resource starvation attacks against Snort_inline’s stream reassebler. Out-of-order packets will have to be put in the right order before processing, taking CPU time. Also, they have to be queue’d so re-order, taking memory.

By setting out-of-order limits, the burden of getting the stream in order is on the sender of the packets. He will have to retransmit the right packets first, before sending more out-of-order packets. In this case, we don’t want InlineDrop() to kill the entire session. To deal with this, we introduced InlineDropPacketOnly(), that just drops the packet.

A official beta should be out RealSoonNow(tm) 😉

Snort_inline 2.6 development update

Development of Snort_inline 2.6 experienced a bit of a setback when William and I discovered that the new Stream4inline had some issues with detecting certain attacks. Since we are scanning the reassembled stream certain detection plugins didn’t work as expected. Basically every detection plugin that uses absolute offsets from the packet start is messed up when we scan the reassembled stream only.

This is because the start of the reassembled stream doesn’t match with the start of the last packet added to this stream. Most TCP sigs are using offsets match against the start of the stream, or relative matches. For example a rule like:

alert tcp any any -> $HTTP_SERVERS 80 (msg:”GET request”; content:”GET”; offset:0; depth:3; sid:12345678; rev:1);

matches against the start of the stream since ‘GET’ will be the first data on the stream. In this case the reassembled stream only scanning would have worked fine because the start of the reassembled stream would match the start of the stream. So offset:0 in the reassembled stream points to the stream start, which is what we want in this case.

Things get different however, when we try to match against midstream packets where the rule matches against the actual packet start. One might argue that this is a bad idea in most cases, and I agree. Since TCP moves data stream based and not packet based, hardly any assumptions can be done about packet sizes, etc. Most TCP rules don’t use this, so the problem is fairly limited. An example of a rule that does this is the eDonkey detection sigs in the Bleeding ruleset.

As a solution we came up with the following. We scan every packet individually and in it’s reassembled stream. This is certainly more expensive, but the only way to avoid the evasion problems. I think we can probably add an option to make this behaviour optional, so the admin can choose to be extra safe at the cost of some perfomance.

Update on Snort_inline development

I have spend the last week trying to find a very annoying bug that caused Snort_inline to go into 100% CPU on certain traffic. It kept working, only my P3 500Mhz home gateway slowed down to between 2kb/s and 25kb/s, while normally it handles the full 325kb/s for my DSL line at around 25% CPU.

Snort comes with a number of performance measurement options. In 2.6 –enable-perfprofiling was introduced. Also, –enable-profile builds Snort for use with gprof. Next to those you can use strace and ltrace with the -c option to see the ammount of time spend in the several functions.

I already knew the problem was related to my new Stream4 code, since running Snort_inline without the ‘stream4inline’ option made the problem go away. So my performance debugging and code reviews were focussed on that code. However, the performance statistics showed no functions that took large ammounts of time in Stream4.

Continue reading

Snort_inline: running Snort_inline

No, it’s not released. But it wil be soon… really!

William has done most of the hard work of porting our Snort_inline patch from 2.4.5 to 2.6. I have mostly been working on improving the stream4inline modification. I have written about this before. Like the stream4inline modification in Snort_inline 2.4.5 it scans the stream in a sliding window, making it possible to drop an attack detected in the reassembled stream. The new code does the same but is much faster, at the cost of higher memory usage.

Another interesting feature is that it keeps track of the number of sequence holes there are in a stream, and it can force a stream to get back in order. This limit can be enforced by the number of out-of-order packets and bytes, and also by the number of simultanious sequence number holes. Inspired by the paper by Sarang Dharmapurikar and Vern Paxson.

Last but not least it adds support for window scaling to stream4. Since window scaling adds the possibility to have window sizes of up a gigabyte, I’ve added a normalizing function as well, that can force all streams to use a configurable maximum wscale setting.

But it is running on my gateway now, which is also the gateway leading to this blog, so if it is unavailable to you, you’ve hit a bug 😉

Snort_inline: idea for an improved bait-and-switch

William Metcalf recently wrote a bait-and-switch plugin for Snort_inline. The idea is that when a rule matches on certain traffic this plugin loads an iptables rule into the system that redirects the offending host to another server. This can present the user an error message such as “Access Denied” for example, but this server can also have al kinds of sniffing tools, or even be a honeypot.

As the plugin currently creates an iptables rule it only works with linux. Also, it has some difficulty with existing iptables rulesets that might be maintained by other programs, such as my own Vuurmuur. My idea is to investigate whether or not it is possible to simply do the redirection in Snort_inline itself. By rewriting the ipaddress in the IP header, it might work as well. Naturally, this would need to be done for every packet, but with a connection to either the flow engine or the stream engine, this should be able to work… just a thought…

Snort_inline: Adapting the TCP stream reassembler

Currently I am rewriting a modification of the TCP reassembler in Snort_inline. Snort’s TCP reassembler is called Stream4 and it works fairly well in IDS mode, however it has some serious issues in inline mode. The biggest and most important issue is that Snort_inline cannot block an attack if it is detected in the reassembled stream. In Snort_inline 2.4 we made our first attempt to fix this with the stream4inline modification.

Stream4 was never designed to be used inline. It was designed to help the detection capabilities of Snort in IDS mode. This had a large consequence for the design. In Stream4, incoming packets are stored in a per-stream list of packets, before the packet is handled normally. After a number of packets have been piled up this way, the stream is flushed. This flush builds a pseudo packet containing the payloads of all the ack’d packets in the list. This pseudo packet is than ran through the detection engine, to see if it contains an attack. This works fine for IDS mode, since this way the attacks can be detected in the reassembled stream.

When running in inline this concept has a big flaw. In inline mode we can use the action drop. This action makes sure that the underlying subsystem (netfilter or ipfilter) discards the packet, making sure it will not reach the destination. The problem with Stream4 is that at the time the pseudo packet is inspected, its content is already accepted and ack’d by the end-host. This situation is of course very unsatisfying.

To deal with this problem in Snort_inline 2.4 we created the stream4inline option. This option is a modification to Stream4 that simply reassembles the stream for every packet received. Then instead of the normal packet, the reassembled stream is scanned every time. There are two performance problems with this approach. The first is that we call the functions to build the pseudo packet for every packet we receive. Since this function walks the packet list every time, this is expensive.

The second problem is harder to solve. Because we scan a sliding window, we end up scanning the same data multiple times. I have not really thought of an (easy) solution for this. The rewrite of the stream4inline option is mostly focused on two improvements. First the performance of the reassembly itself will be improved by keeping a cached version of the reassembled stream. Second, the implementation will be much better and robust. I will discuss it on a more technical level later.