Snort_inline and out of order packets

In Snort_inline’s stream4 modifications, one of the changes is that out of order TCP packets are treated differently from unmodified stream4. This can cause some new alerts to appear and some unexpected behaviour. So I’ll try to explain what happens here.

First of all let me explain quickly what out of order packets are. To put it simple, TCP packets are send out by the source host in a specific order but can arrive in a different order at the destination. Packetloss, link saturation, routing issues are among many things that can cause this. A Snort_inline specific issue is that when Snort_inline can’t keep up with the packets it needs to process, it will drop packets which causes packetloss. These packets will then have to be resent by the sending host.

Out of order packets become a problem when dealing with stream reassembly. Stream reassembly basically is putting all data from the packets in the right order to get the original data as it was sent. We can’t do stream reassembly if we don’t have all packets. Unmodified stream4 basically ignores gaps in the stream. Designed for passive listening for traffic, it has to deal with packetloss differently than Snort_inline.

Next, some definitions of this functionality in Snort_inline. Out-of-order packets: The number of packets that we have in queue that are out of order for a stream. This means they have a higher sequence number than the next in-sequence packet we are expecting. Out-of-order bytes: The number of bytes of the combined data of the out-of-order packets in the stream. Sequence number hole: A gap between two packets, that can be closed by one or more missing packets.

To prevent Snort_inline from using to much memory on bad connections or when an attacker sends lots of out of order packets, Snort_inline can enforce limits to protect itself. Snort_inline can even force a stream to be completely in-order by dropping all packets that are out of order. Sadly, this has a bad effect on the performance of the connections, so you can set certain limits that balance between performance and protection.

When Snort_inline hits these limits, it will (optionally) fire alerts that look like this:

(spp_stream4) TCP out-of-order packets limit reached for stream
(spp_stream4) TCP out-of-order bytes limit reached for stream
(spp_stream4) TCP sequence number holes limit reached for stream

You can disable the alerts by adding the following option to the preprocessor stream4 line: disable_ooo_alerts. The limits themselves can be adjusted by using the following options: max_seq_holes 2, max_ooo_pkts 25, max_ooo_bytes 7000. These are the values I currently use on my home gateway. I got the idea of implementing these limits from this paper by Vern Paxson. However, it seems to me that his suggestion that at max one sequence hole per stream (even per host) was a bit optimistic. Maybe DSL has more packetloss than the university links he studied.

By default Snort_inline uses the settings that were chosen a bit randomly, so they may not fit your usage. Like with the wscaling, please let me know in a comment what values you use!

TCP Window scaling in Snort_inline

The TCP window field in the TCP header is only 16 bits, so the maximum window size it can handle is only 64kb. A long time ago this was enough, but nowadays it isn’t, by far. Luckily, this is something the window scaling option fixes. Window scaling is very common these days. Your pc or laptop probably uses it by default. Snort’s stream4 however, does not support it. This means that when tracking and reassembling streams, Snort for most connections has no idea about what data is in window and which is out of window. To make matters worse, the packets that are in window when using wscaling, but appear out of window when the wscaling is not accounted for, are never used in the reassembly process. This makes Snort evadable.

One of the goals when creating the stream4inline modifications, was to be able to drop on all TCP anomalies stream4 detects. For this support for window scaling was added to Stream4, so Snort_inline would be able to drop out of window packets. There is however a big problem with window scaling. With window scaling the TCP window possibly increases to a maximum of 1GB (with the maximum wscale value of 14). Stream4 would thus theoretically have to queue up to 1GB of packet data, per stream. While this is something that is unlikely to happen during normal connections, it is possible. This could then be used by an attacker to attack Snort_inline itself.

To prevent this, I added an option to stream4inline that allows the administrator to set a maximum allowable wscale setting. Any higher setting will be normalized away. In these cases the packet is modified and the wscale lowered to the maximum that is allowed. The hosts talking to each other then think the other accepts only the lower wscale and accepts that setting. This can however have some unexpected consequences. If the link that Snort_inline deals with is high speed, high latency or both, setting the wscale value to low can result in serious performance degradation. Connections that are (way) slower than usual is how this issue shows. In these cases the wscale value needs to be increased.

The default value of Snort_inline is a wscale of 2, which is quite low but works fine on my home DSL connection. To change the setting add ‘norm_wscale_max 5’ to your stream4 configuration line. This will allow for a wscale of up to 5. The maximum value is 14. I’d be interested in what values people use on what types and speeds of lines, so please let me know! We can use it to suggest values in the docs or to set a less insane default value 🙂

Memory leak fixed in stream4inline

A few days ago William told me that if he enabled stream4inline on a busy gateway, Snort_inline would consume all memory within hours. The problem went away when disabling stream4inline, so it made sense that the problem would be in there somewhere.

The first suspect was the reassembly cache. The reassembly cache is used to keep a per stream copy of the reassembled packet in memory. While being memory expensive, it greatly speeds up the sliding window stream reassembly process, especially with small packets. The reason for this being the first and primary suspect is that this is the only place where stream4inline code allocates memory. Reviewing the code however, showed no leaks and adding a debug counter to monitor the memory usage also showed that the leak was not in that code.

Next my investigation focused on parts where stream4 behaves differently in stream4inline mode. I initially focused on what happened when stream4 hit it’s memory limit: the memcap. When the configurable memcap is reached, stream4 nukes 5 random sessions. In stream4inline the option to truncate 15 of the sessions was added, where an attempt is made to clear the memory by removing stored packets no longer needed from a stream. If that fails, 5 random sessions are nuked anyway.

Reviewing the truncating of the sessions didn’t show anything obvious to me so I went on to the killing of the sessions. Descending down the code I finally reached the DropSession function, where the memory cleanup for a session is handled. Here it turned out that the DeleteSpd function, used to clear the stored packets in a stream, was not called in stream4inline mode. The reason for this mistake is that with Snort 2.6.1 support for UDP was added to stream4. The merge with the Snort_inline code went wrong because of extra checks added to the DropSession function.

The stupid thing is that when I did the merge, I was already in doubt about it as a comment showed:

/* XXX did I merge this right??? VJ */

Guess I know the answer now: No 😉

Snort_inline and TCP Segmentation Offloading

Since a short while I have a gigabit setup at home. My laptop has a e1000 Intel NIC, my desktop a Broadcom NIC.While playing with Snort_inline and netpipe-tcp, I noticed something odd. I got tcp packets that had the ‘Don’t Fragment’ option set, but were still bigger than the mtu size of the link. Snort_inline read packets of up to 26kb from the queue, and wireshark and tcpdump were seeing the packets as well. This was only for outgoing packets on the e1000 NIC. The receiving pc saw the packets split up in multiple packets that were honoring the mtu size. This got me thinking that some form of offloading must be taking place and indeed this was the case:

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on

It can be disabled by using the following command:

# ethtool -K eth0 tso off

The large packets caused problems in my stream4inline modifications of Stream4. The code can’t handle the case where the packet is bigger than the sliding window size. So I have some work to do 😉

Snort_inline patch updated to

With the recent Snort vulnerabilities we had to make a choice if we would backport the fixes to our Snort_inline patch or that we would upgrade to Upgrading makes most sense since SourceFire improves Snort with every release, but since the upgrade process has been very painful the last couple of releases, we weren’t really looking forward to it.

Earlier I wrote about my testing with Subversion for Snort_inline, and I found out that using Subversion made the upgrade procedure much easier and much less time consuming. So upgrading it was. Generally there were little changes to the Snort_inline patch required.

One thing however, messed up the way the new stream4inline code works. A new option in Snort’s Stream4, which is enabled by default, is session dropping. The way it works is that when a packet is dropped, the session it belongs to is instructed to drop every packet from that session from that time on.

This makes sense in many cases, but not in all. In stream4inline, we have created options to drop out-of-order packets, out-of-window packets, bad reset packets, and more. Generally, in these cases we want just drop those individual packets, not kill off the session.

Especially killing the session on bad reset packets would be making it easier to kill sessions by third parties. One might argue that sessions writing outside of the window can be killed, but when looking at the out-of-order limits, this can’t be done.

The out-of-order limits are enforced not because it is bad traffic, but to prevent resource starvation attacks against Snort_inline’s stream reassebler. Out-of-order packets will have to be put in the right order before processing, taking CPU time. Also, they have to be queue’d so re-order, taking memory.

By setting out-of-order limits, the burden of getting the stream in order is on the sender of the packets. He will have to retransmit the right packets first, before sending more out-of-order packets. In this case, we don’t want InlineDrop() to kill the entire session. To deal with this, we introduced InlineDropPacketOnly(), that just drops the packet.

A official beta should be out RealSoonNow(tm) 😉

Snort_inline 2.6 development update

Development of Snort_inline 2.6 experienced a bit of a setback when William and I discovered that the new Stream4inline had some issues with detecting certain attacks. Since we are scanning the reassembled stream certain detection plugins didn’t work as expected. Basically every detection plugin that uses absolute offsets from the packet start is messed up when we scan the reassembled stream only.

This is because the start of the reassembled stream doesn’t match with the start of the last packet added to this stream. Most TCP sigs are using offsets match against the start of the stream, or relative matches. For example a rule like:

alert tcp any any -> $HTTP_SERVERS 80 (msg:”GET request”; content:”GET”; offset:0; depth:3; sid:12345678; rev:1);

matches against the start of the stream since ‘GET’ will be the first data on the stream. In this case the reassembled stream only scanning would have worked fine because the start of the reassembled stream would match the start of the stream. So offset:0 in the reassembled stream points to the stream start, which is what we want in this case.

Things get different however, when we try to match against midstream packets where the rule matches against the actual packet start. One might argue that this is a bad idea in most cases, and I agree. Since TCP moves data stream based and not packet based, hardly any assumptions can be done about packet sizes, etc. Most TCP rules don’t use this, so the problem is fairly limited. An example of a rule that does this is the eDonkey detection sigs in the Bleeding ruleset.

As a solution we came up with the following. We scan every packet individually and in it’s reassembled stream. This is certainly more expensive, but the only way to avoid the evasion problems. I think we can probably add an option to make this behaviour optional, so the admin can choose to be extra safe at the cost of some perfomance.

Update on Snort_inline development

I have spend the last week trying to find a very annoying bug that caused Snort_inline to go into 100% CPU on certain traffic. It kept working, only my P3 500Mhz home gateway slowed down to between 2kb/s and 25kb/s, while normally it handles the full 325kb/s for my DSL line at around 25% CPU.

Snort comes with a number of performance measurement options. In 2.6 –enable-perfprofiling was introduced. Also, –enable-profile builds Snort for use with gprof. Next to those you can use strace and ltrace with the -c option to see the ammount of time spend in the several functions.

I already knew the problem was related to my new Stream4 code, since running Snort_inline without the ‘stream4inline’ option made the problem go away. So my performance debugging and code reviews were focussed on that code. However, the performance statistics showed no functions that took large ammounts of time in Stream4.

Continue reading