Important Suricata update

We just released Suricata 1.3.3 which contains some important accuracy fixes. Also, it should be much more robust against out of memory conditions.

For those of you running Suricata in IPS mode, this is important as well. We found that rules that have the drop or reject actions, were not playing well with thresholding.

So upgrading is highly recommended!

Code changes are not too big, largest changes are due to some extra unittests:

 ChangeLog                           |   11 +
 libhtp/htp/dslib.c                  |    4 +-
 libhtp/htp/hooks.c                  |   31 +-
 libhtp/htp/htp_connection.c         |   34 ++-
 libhtp/htp/htp_connection_parser.c  |   25 +-
 libhtp/htp/htp_parsers.c            |    2 +-
 libhtp/htp/htp_request.c            |    4 +-
 libhtp/htp/htp_request_apache_2_2.c |   24 +-
 libhtp/htp/htp_transaction.c        |   68 +++--
 libhtp/htp/htp_util.c               |   35 ++-
 src/alert-debuglog.c                |    4 +-
 src/app-layer.c                     |    9 +-
 src/decode.h                        |    3 +-
 src/detect-detection-filter.c       |   96 ++++++
 src/detect-engine-alert.c           |   37 ++-
 src/detect-engine-hcbd.c            |    5 +
 src/detect-engine-hhd.c             |  121 +++++++-
 src/detect-engine-hsbd.c            |    5 +
 src/detect-engine-iponly.c          |    5 +-
 src/detect-engine-payload.c         |   26 ++
 src/detect-engine-threshold.c       |   15 +-
 src/detect-filemd5.c                |   24 +-
 src/detect-filestore.c              |   11 +-
 src/detect-filestore.h              |    2 +-
 src/detect-pcre.c                   |  485 +----------------------------
 src/detect-threshold.c              |  569 ++++++++++++++++++++++++++++++++++-
 src/detect.c                        |   11 +-
 src/detect.h                        |    2 +-
 src/flow-hash.c                     |   10 +-
 src/flow-timeout.c                  |   10 +-
 src/flow.c                          |    1 -
 src/flow.h                          |   14 +
 src/log-httplog.c                   |    2 +-
 src/runmodes.c                      |    2 +-
 src/source-ipfw.c                   |    1 +
 src/source-pfring.c                 |   20 +-
 src/stream-tcp-reassemble.c         |    4 +-
 src/stream-tcp.c                    |   12 +-
 src/stream.c                        |    3 +-
 src/threads.h                       |    1 +
 src/tmqh-packetpool.c               |    5 +-
 src/util-buffer.h                   |    6 +-
 src/util-debug.c                    |    2 +-
 src/util-host-os-info.c             |   32 +-
 src/util-threshold-config.c         |  210 +++++++++++++
 suricata.yaml.in                    |    6 +-
 46 files changed, 1340 insertions(+), 669 deletions(-)

Suricata 1.4 development update

Today, a day after 1.3.2, we’ve released 1.4beta2. While 1.3.2 is an important update for those running 1.3.1 or lower, today’s release is where things get exciting. A lot of things were improved and added. Let me show some numbers first.

The 1.4beta2 release is a pretty big update over 1.4beta1 as it touches over 5k lines of code:

234 files changed, 5033 insertions(+), 3759 deletions(-)

Compared to 1.4beta2 vs yesterday’s 1.3.2 it’s clear over 11k lines of code are touched:

262 files changed, 11406 insertions(+), 5794 deletions(-)

Personally, I’ve been working on two main area’s: defrag engine and the luajit integration, and a couple of other things.

Defrag

The defrag engine was the last major subsystem that still used a Big Lock. Defrag uses so called “trackers” to track fragments belonging to a single IP packet. These trackers are stored in a hash table. 1.3 and prior used a hash that had no locking, so it relied on a Big Lock to protect it’s operations. Suricata has had fine grained hashes for flow and host tables for some time already, so it made sense to port defrag over as well.

Luajit

I’ve written about the luajit a couple of times already. While the basic functionality debuted in beta1, the code has been completely overhauled. The most important change that is user visible is the integration with the various HTTP inspection engines. This did result in a limitation though, for now you can just inspect one HTTP buffer per script.

A weird challenge with luajit is that it’s “state” needs to be in the 32 bit part of memory. The reason isn’t clear to me, but this gave us some trouble. Some users use many rules and agressive pattern matcher settings. When after this memory usage the luajit states had to be alloc’d, it failed. I’ve worked around this by allocating a bunch of states in advance, hoping they’ll end up in the proper memory. We’ll see how that will work.

Misc

I’ve also largely rewritten the optional rule profiling to perform better. Here too, a Big Lock was removed. The accounting is now first done on a per thread basis, and only merged at detection engine shut down. Another nice feature is that it will now print the profiling stats during a live rule reload as well.

Next, I’ve improved performance of the decode, stream and app layer event keywords. They were quite expensive as they were checked quite often. I’ve now added a prefilter check to the detection engine’s prefilter stage. Helps quite a bit!

Finally, I’ve been working on getting global and rule threshold play well together. This work isn’t done yet, but some real progress has been made. Work is tracker here and documentation lives here.

So all in all quite a bit of changes. Please help us test this so we can move to a stable and high performing 1.4! :)

Suricata 1.3.2 is out

Today we released Suricata 1.3.2. Not a big update, but there are some important fixes in the stream engine, fast_pattern:chop handling, HTTP multipart parsing and the flow keyword with “nostream”.

As the diff stat output shows, it’s a rather light maintenance update over 1.3.1:

 ChangeLog                              |   12 ++
 libhtp/configure.ac                    |    2 +-
 libhtp/htp.pc.in                       |    2 +-
 libhtp/htp/htp.h                       |    2 +-
 src/app-layer-htp-file.c               |  145 ++++++++++++++++++++++++
 src/app-layer-htp.c                    |  192 ++++++++++++++++++++++++++------
 src/decode.c                           |    3 +
 src/decode.h                           |    1 +
 src/defrag.c                           |    4 +-
 src/detect-engine-content-inspection.c |    9 --
 src/detect-flow.c                      |   68 ++++++++++-
 src/source-af-packet.c                 |    9 ++
 src/source-ipfw.c                      |   13 ++-
 src/source-pfring.c                    |   28 ++---
 src/stream-tcp-reassemble.c            |    1 +
 src/util-cpu.c                         |   10 +-
 16 files changed, 435 insertions(+), 66 deletions(-)

Only the HTTP changes look big, but that is due to adding some unittests. Same for flow keyword.

Because of the fixes updating is still highly recommended. Most fixes improve detection accuracy.

Full notes at our new website: http://suricata-ids.org/2012/10/03/suricata-1-3-2-available/

Suricata luajit update

After an exciting week of meeting and working with the team around the RAID conference, time for another lua update.

The keyword supports an interesting set of buffers now:

packet
payload

http.uri
http.uri.raw
http.request_line
http.request_headers
http.request_headers.raw
http.request_cookie
http.request_user_agent
http.request_body

http.response_headers
http.response_headers.raw
http.response_body
http.response_cookie

The http keywords are now integrated into their respective inspection engines. This led to one important limitation for now: you can only inspect one such buffer per script.

We pass the inspection offset to the script as well for these. In the lua script you can access it as follows:

function match(args)
    a = tostring(args["http.request_headers.raw"])
    o = args["offset"]

    s = a:sub(o)
    print (s)

    return 0
end

In a buffer “Mozilla/5.0″ and a signature “content:Mozilla;”, “s” in the script will contain “/5.0″. At this moment there is no way yet to pass back an offset from the script to the inspection engine.

On the performance side things are looking good as well. At RAID Will Metcalf converted a set of 6 ETpro sigs to a single lua script. It resulted in better detection accuracy and better performance. That work is still private, but we’ll get some real world scripts public soon! :)

Update 10/4: this code is now available for testing in the new Suricata 1.4beta2 release!

First beta for Suricata 1.4

The first test release for the new Suricata 1.4 branch as just been released. Some really exciting stuff was added. Let me highlight some of it:

AF_PACKET IPS mode: Eric Leblond has been working on extending the passive AF_PACKET support to support IPS as well. Eric has documented the new feature on his blog.

TLS logging and certificate storage: created by contributor Jean-Paul Roliers under guidance of Eric Leblond. As a bonus, a rule keyword to match on certifcate fingerprints.

Custom HTTP logging: contributor Ignacio Sanchez created a new output mode for our HTTP log module, allowing the admin to customize the log message format. He has made it compatible to Apache’s mod_log_config. For more information, see our wiki page.

Tunnel decoding: Michel Saborde opened a bunch of tickets for Teredo, IPv4-in-IPv6 and IPv6-in-IPv6 tunneling. Saved a lot of time in Eric’s implementation.

There is more, like the luajit keyword I wrote about yesterday here.

So there are a lot of changes. Git gives us the following numbers: “106 files changed, 6966 insertions(+), 2259 deletions(-)” in just 3 weeks. This means the release is definitely beta quality, so use with care.

Grab it here: http://www.openinfosecfoundation.org/download/suricata-1.4beta1.tar.gz

Next week the team will be in Amsterdam for the RAID 2012 conference. After that we’ll continue to work towards 1.4beta2. For an idea of what is coming, check the milestone.

Until than, have fun with this new beta. Many thanks to our generous contributors!

Suricata http_user_agent vs http_header

One of the new features in Suricata 1.3 is a new content modifier called http_user_agent. This allows rule writers to match on the User-Agent header in HTTP requests more efficiently. The new keyword is documented in the OISF wiki. In this post, I’ll show it’s efficiency with two examples.

Example 1: rarely matching UA

Consider a signature where the match if on a part of the UA that is very rare, so not part of regular User Agents. In my example “abc”.

The signature looks like this:
alert http any any -> any any (msg:"User-Agent abc http_header"; content:"User-Agent: "; http_header; nocase; content:"abc"; http_header; distance:0; pcre:"/User-Agent:[^\n]*abc/iH"; sid:1; rev:1;)

The http_user_agent variant looks much simpler:
alert http any any -> any any (msg:"User-Agent abc http_user_agent"; content:"abc"; http_user_agent; sid:2; rev:1;)

Now when running this against a pcap with over 12.500 HTTP requests, neither signature matched. However, signature 1 was inspected 209752 times! This high number is because the request headers are inspected one-by-one. Signature 2 wasn’t inspected at all, as it never made it past the multi pattern matching stage (mpm).

When looking at pcap runtime, running with only the http_user_agent version is about 10% faster.

Example 2: commonly matching UA

So, what if we want to match on something that is quite common? In other words, the signature will have frequent matches?

First, the http_header signature:
alert http any any -> any any (msg:"User-Agent MSIE 6 http_header"; content:"User-Agent: "; http_header; nocase; content:"MSIE 6"; http_header; distance:0; pcre:"/User-Agent:[^\n]*MSIE 6/iH"; sid:3; rev:1;)
The http_user_agent variant:
alert http any any -> any any (msg:"User-Agent MSIE 6 http_user_agent"; content:"MSIE 6"; http_user_agent; sid:4; rev:1;)

In this case both signatures do match, just over 10.000 times even. The stats look like this:

Each of the inspections of signature 4, the http_user_agent variant, is actually a match. This makes sense as we look for a simple string and the mpm will only invoke the signature if that string is found. It’s clear that the http_header variant takes way more resources. Here too, when looking at pcap runtime, running with only the http_user_agent version is approximately 10% faster.

Final remarks

It’s quite clear that the http_user_agent keyword is much more efficient that inspecting all the HTTP headers. But other than efficiency, the http_user_agent also allows for much easier to read rules.

The Emerging Threats project will likely fork their Suricata ruleset for 1.3 (see this blog post). Even though this will be a significant effort on their side, it’s pretty clear to me the performance effect will be noticeable!

HTTP parsing events in Suricata

With the 1.2rc1 release you will notice no more HTTP errors on the screen. Or SMTP errors. This output has been disabled finally. This was a long time annoyance.

As you may still be interested in the errors they are now available through the rule language. In rules/http-events.rules and rules/smtp-events.rules rules for all possible events/errors can be found.

Example:
app-layer-event:http.missing_host_header;

This will match on HTTP/1.1 requests without a Host header.

Some of these rules might be noisy (they are not in my local network), but rather than disabling them I’d suggest suppressing then. The reason is that for each time they hit a flowint will be incremented:

flowint:http.anomaly.count,+,1;

This will allow you to get alerts on streams with high anomaly counts:

alert http any any -> any any (msg:"LOCAL really poor HTTP session"; flowint:http.anomaly.count,>,5; sid:123; rev:1;)

This will give you an alert if there have been more than 5 anomalies detected.

Blog spammers, malware and other unwanted HTTP users often use HTTP with all kinds of issues, so this may be a helpful tool in detecting those.

File extraction in Suricata

Today I pushed out a new feature in Suricata I’m very excited about. It has been long in the making and with over 6000 new lines of code it’s a significant effort. It’s available in the current git master. I’d consider it alpha quality, so handle with care.

So what is this all about? Simply put, we can now extract files from HTTP streams in Suricata. Both uploads and downloads. Fully controlled by the rule language. But thats not all. I’ve added a touch of magic. By utilizing libmagic (this powers the “file” command), we know the file type of files as well. Lots of interesting stuff that can be done there.

Rule keywords

Four new rule keywords were added: filename, fileext, filemagic and filestore.

Filename and fileext are pretty trivial: match on the full name or file extension of a file.

alert http any any -> any any (filename:”secret.xls”;)
alert http any any -> any any (fileext:”pdf”;)

More interesting is the filemagic keyword. It runs on the magic output of inspecting the (start of) a file. This value is for example:

GIF image data, version 89a, 1 x 1
PE32 executable for MS Windows (GUI) Intel 80386 32-bit
HTML document text
Macromedia Flash data (compressed), version 9
MS Windows icon resource – 2 icons, 16×16, 256-colors
PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced
JPEG image data, JFIF standard 1.01
PDF document, version 1.6

So how the filemagic keyword allows you to match on this is pretty simple:

alert http any any -> any any (filemagic:”PDF document”;)
alert http any any -> any any (filemagic:”PDF document, version 1.6″;)

Pretty cool, eh? You can match both very specifically and loosely. For example:

alert http any any -> any any (filemagic:”executable for MS Windows”;)

Will match on (among others) these types:

PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
PE32 executable for MS Windows (GUI) Intel 80386 32-bit
PE32+ executable for MS Windows (GUI) Mono/.Net assembly

Finally there is the filestore keyword. It is the simplest of all: if the rule matches, the files will be written to disk.

Naturally you can combine the file keywords with the regular HTTP keywords, limiting to POST’s for example:

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload claimed, but not pdf”; flow:established,to_server; content:”POST”; http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1; rev:1;)

This will alert on and store all files that are uploaded using a POST request that have a filename extension of pdf, but the actual file is not pdf.

Storage

The storage to disk is handled by a new output module called “file”. It’s config looks like this:

enabled: yes # set to yes to enable
log-dir: files # directory to store the files
force-magic: no # force logging magic on all stored files

It needs to be enabled for file storing to work.

The files are stored to disk as “file.1″, “file.2″, etc. For each of the files a meta file is created containing the flow information, file name, size, etc. Example:

TIME: 01/27/2010-17:41:11.579196
PCAP PKT NUM: 2847035
SRC IP: 68.142.93.214
DST IP: 10.7.185.57
PROTO: 6
SRC PORT: 80
DST PORT: 56207
FILENAME: /msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe
MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly
STATE: CLOSED
SIZE: 5204

Configuration

The file extraction is for HTTP only currently, and works on top of our HTTP parser. As the HTTP parser runs on top of the stream reassembly engine, configuration parameters of both these parts of Suricata affect handling of files.

The stream engine option “stream.reassembly.depth” (default 1 Mb) controls the depth into a stream in which we look. Set to 0 for no limit.
The libhtp options request-body-limit and response-body-limit control how far into a HTTP request or response body we look. Again set to 0 for no limit. This can be controlled per HTTP server.

Performance

The file handling is fully streaming, so it’s very efficient. Nonetheless there will be an overhead for the extra parsing, book keeping, writing to disk, etc. Memory requirements appear to be limited as well. Suricata shouldn’t keep more than a few kb per flow in memory.

Limitations

Lack of limits is a limitation. For file storage no limits have been implemented yet. So it’s easy to clutter your disk up with files. Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000 files. Better use a separate partition if you’re on a life link.

Future work

Apart from stabilizing this code and performance optimizing it, the next step will be SMTP file extraction. Possibly other protocols, although nothing is set in stone there yet.

Using Modsec2sguil for HTTP transaction logging revisited

Recently I wrote about the idea to log all HTTP transactions into Sguil using my Modsec2sguil agent. I’ve implemented this in the current 0.8-dev5 release and it works very well. All events go into Sguil smoothly and I’ve not experienced slowdowns on the webserver. I’ve been running it for almost a week now, like to share the first experiences here.

I find it to be quite useful. When receiving an alert, it is perhaps more interesting to see what else was done from that ipaddress than to see what was blocked (unless you are suspecting a false positive of course). One area I find to be useful is when I’m creating rules against comment spam on this blog. By seeing all properties of a spam message I can create better rules. For example on broken user-agents or weird codes inserted into the comment field of WordPress.

It’s easy to search and filter on HTTP response codes because the code is a part of the RT message. For example, when searching for all HTTP 500 error codes, add the following ‘WHERE’ clause to a query:

WHERE event.signature like “%MSc 500%”

This works quite fast although you best limit the query on properties like date and port as well. To get all the HTTP code 500 alerts from the last days do something like:

WHERE event.timestamp > ’2007-08-18′ AND (event.dst_port = 80 OR event.dst_port = 443) AND event.signature like “%MSc 500%”

One thing that is disappointing is the inabillity to search in the event payloads stored in the database. Technically it’s possible to create mysql queries that search for certain strings, but this process is so slow that it’s hardly usable in practice. The problem here is that the database field containing the payload is not indexed. I’ll show the query I used here (ripped from David Bianco’s blog)

WHERE event.timestamp >= ’2007-08-18′ AND (event.dst_port = 80 OR event.dst_port = 443) AND data.data_payload like CONCAT(“%”, HEX(“Mozilla/5.0″), “%”)

If you know a more efficient query, please let me know!

Using Modsec2sguil for HTTP transaction logging

Modsec2sguil is currently configured to send alerts to Sguil. ModSecurity can be configured to log any event or transaction, including 200 OK, 302 Redirect, etc. Modsec2sguil distinguishes between alerts and other events by only processing HTTP codes of 400 and higher. Since 0.8-dev2 there is a configuration directive to prevent certain codes, such as 404, from being treated as an alert.

Now I have the following idea. Since ModSecurity can log all events with details of request headers, response headers and POST message body, it may be interesting to just send all these events to Sguil. They should not be appearing as alerts, but having them in the database can perhaps be interesting. I know using flow data and full packet captures the same data can be accessed, but having it in the database makes querying it a lot easier and longer available.

Possible problems are mostly the performance hit the webserver may take for sending all these events to Sguil and the storage requirements in Sguil’s database. I estimate the events are about 1kb in size on average, so on a busy site this may cause the database to grow very rapidly. Of course this behavior would be optional so it can be disabled.

Any thoughts on this idea?