IP Reputation in Suricata

Disclaimer: this work was sponsored by Emerging Threats Pro.

One thing we’ve been talking about for many years at OISF is IP Reputation. The basic idea is that many organizations have information about specific IP-addresses. This information may be that a host is infected, acts as a spam relay or many other things. We’ve always thought it might be useful to apply this info to the IDS directly.

In the last weeks I’ve developed code to load IP reputation information into Suricata. This code is now part of the Suricata git master, so it’s available to all.

The work consisted of 3 main parts: data load, internal data structures and a rule keyword.

Data loading

The data I worked with was provided by Emerging Threats Pro. The data format is very simple. Two types of CSV files, one to define a mapping between category names and id’s and the other to define the scores for hosts in the categories.

The data formats are documented here: IP Reputation Format.

Internal Data Structures

To store the data in memory I hooked into our “Hosts” API. The Hosts API is a hash table like the Flow table that can be used to store data per host. It’s in use for Tagging and Thresholding. I added storage for IP Reputation to it.

Rule keyword

A new rule keyword to match on the reputation data was introduced: “iprep”. The keyword allows a rule to match on a specific category. Example:

alert ... (flow:to_server; iprep:src,Bot,>,10;)

This will generate an alert if the SRC IP of the host talking to a server is known to have a score of >10 in the “Bot” category.

The keyword is compatible to Suricata’s concept of “IP-only” rules. These are rules that do not inspect packet content or flow state and can thus be inspected once per flow direction instead of for each packet.


I’ve been playing with data sets of up to a million entries. Loading it takes hardly any time and I’m confident larger numbers will work just fine. The host table just needs bigger memcaps and hash sizes.

At runtime, the speed depends mostly on the rules. A pure “iprep” rule is quite expensive when not IP-only, although this is mostly due to the frequency of the checks. Such rules will be checked against large numbers of packets.

When created as a IP-only rule, things change. Such rules are checked only once per flow direction, so overhead appears to be minimal in this case.


The data I used from Emerging Threats Pro is not available for free, so for those who want to test creating your own data is required right now. Matt Jonkman from Emerging Threats Pro will make a free feed available within a few weeks though. Of course you could also get the paid data from Emerging Threats Pro. :)

Update 29/11/2012

This feature is part of the just released 1.4rc1 version, please help us test it!

Important Suricata update

We just released Suricata 1.3.3 which contains some important accuracy fixes. Also, it should be much more robust against out of memory conditions.

For those of you running Suricata in IPS mode, this is important as well. We found that rules that have the drop or reject actions, were not playing well with thresholding.

So upgrading is highly recommended!

Code changes are not too big, largest changes are due to some extra unittests:

 ChangeLog                           |   11 +
 libhtp/htp/dslib.c                  |    4 +-
 libhtp/htp/hooks.c                  |   31 +-
 libhtp/htp/htp_connection.c         |   34 ++-
 libhtp/htp/htp_connection_parser.c  |   25 +-
 libhtp/htp/htp_parsers.c            |    2 +-
 libhtp/htp/htp_request.c            |    4 +-
 libhtp/htp/htp_request_apache_2_2.c |   24 +-
 libhtp/htp/htp_transaction.c        |   68 +++--
 libhtp/htp/htp_util.c               |   35 ++-
 src/alert-debuglog.c                |    4 +-
 src/app-layer.c                     |    9 +-
 src/decode.h                        |    3 +-
 src/detect-detection-filter.c       |   96 ++++++
 src/detect-engine-alert.c           |   37 ++-
 src/detect-engine-hcbd.c            |    5 +
 src/detect-engine-hhd.c             |  121 +++++++-
 src/detect-engine-hsbd.c            |    5 +
 src/detect-engine-iponly.c          |    5 +-
 src/detect-engine-payload.c         |   26 ++
 src/detect-engine-threshold.c       |   15 +-
 src/detect-filemd5.c                |   24 +-
 src/detect-filestore.c              |   11 +-
 src/detect-filestore.h              |    2 +-
 src/detect-pcre.c                   |  485 +----------------------------
 src/detect-threshold.c              |  569 ++++++++++++++++++++++++++++++++++-
 src/detect.c                        |   11 +-
 src/detect.h                        |    2 +-
 src/flow-hash.c                     |   10 +-
 src/flow-timeout.c                  |   10 +-
 src/flow.c                          |    1 -
 src/flow.h                          |   14 +
 src/log-httplog.c                   |    2 +-
 src/runmodes.c                      |    2 +-
 src/source-ipfw.c                   |    1 +
 src/source-pfring.c                 |   20 +-
 src/stream-tcp-reassemble.c         |    4 +-
 src/stream-tcp.c                    |   12 +-
 src/stream.c                        |    3 +-
 src/threads.h                       |    1 +
 src/tmqh-packetpool.c               |    5 +-
 src/util-buffer.h                   |    6 +-
 src/util-debug.c                    |    2 +-
 src/util-host-os-info.c             |   32 +-
 src/util-threshold-config.c         |  210 +++++++++++++
 suricata.yaml.in                    |    6 +-
 46 files changed, 1340 insertions(+), 669 deletions(-)

Setting up an IPS with Fedora 17, Suricata and Vuurmuur

I recently found out that Fedora includes Vuurmuur in it’s repositories. Since Suricata is also included, I figured I would do a quick write up on how to setup a Fedora IPS. While writing it turned more into a real “howto”, so I decided to submit it to Howtoforge.

It can be found here one HowtoForge.

Vuurmuur on Fedora is at the 0.7 version, which is still the current stable. It’s rather old though, and it reminds me again I need to make sure the 0.8 branch gets to a stable release soon. The Suricata included in Fedora 17 is 1.2.1, with 1.3.2 expected to land any day now.

The guide sets the user up from base Fedora install to a working IPS, but doesn’t cover any advanced topics such as rule management, event management etc. Still, I hope it’s useful to some, especially those that are intimidated by Vuurmuur’s and Suricata’s initial learning curves.

Looking forward to feedback! :)

Suricata MD5 blacklisting

For a few months Suricata has been able to calculate the MD5 checksum of files it sees in HTTP streams. Regardless of extraction to disk, the MD5 could be calculated and logged. Martin Holste created a set of very cool scripts to use the logged MD5 to look it up at VirusTotal and some other similar services. This is done outside of Suricata. One thing I have been wanting to try is matching against these MD5’s in Suricata itself.

In the recent 1.3beta2 release, I’ve added a first attempt at this. The current support is crude but works. I’ve added a rule keyword, called “filemd5″.


The keyword opens the file “filename” from your rule directory and loads it’s content. It expects a heximal MD5 per line:


Any extra info on a line is ignored, so the output of md5sum can be used safely:

91849eac70248b01e3723d12988c69ac suricata-1.3beta2.tar.gz

At start up, Suricata will tell you how much memory the hash table uses. The hash table is quite compact. It uses hash_rows * 4 + md5’s * 16 bytes. For 20155064 MD5’s it uses a bit more than 300mb:

[3748] 9/6/2012 -- 08:40:44 - (detect-filemd5.c:264) (DetectFileMd5Parse) -- MD5 hash size 324578208 bytes

Performance so far seems to be great. I’ve been testing with 20 million MD5’s and so far I’m not seeing any significant performance impact. The dedicated data structures I created for it seem to hold up quite nicely. Right now the only slow down I see is at start up, where it adds a few seconds. The data structure is currently limited to 32 bit, so a 4GB table. This should allow ~250 million MD5’s, although I haven’t tested that.

As this is a regular rule keyword, it can be combined with other rule keywords, such as filemagic or filename. A sig like “filemagic:pdf; filemd5:bad_pdfs;” would match the list “bad_pdfs” only against pdf files.

I think there are several possible use cases for this new functionality. First, I could imagine a project like Emerging Threats shipping a list of the most recent malware MD5’s. It should be possible to distribute the most recent 100k or so MD5’s.

Second, this could be used as a poor man’s DLP. Hash the files you don’t want to see on your network outbound or unencrypted and have Suricata look for them.

The most interesting use case probably is not implemented yet, but will be. When negated matching is implemented, the filemd5 keyword could be used for white listing.

As Martin Holste tweeted: “Awesome, more than enough to handle all Windows OS files. So can we just do: filemagic:exe; filemd5:!whitelist.txt;?”

I think an alert for all executable downloads that are not “pre approved” is definitely something that can be useful.

Again, the work continues! :)

F-Secure AV updates and Suricata IPS

My ISP recently started providing 3 F-Secure AV copies to each of their customers. I installed it but noticed that updates timed out.

It turned out that Suricata, which runs in IPS mode, blocked the update. There were 3 Emerging Threats rules that alerted:

[1:2003614:4] ET VIRUS WinUpack Modified PE Header Inbound
[1:2009557:2] ET TROJAN Yoda’s Protector Packed Binary
[1:2012086:2] ET SHELLCODE Possible Call with No Offset TCP Shellcode

It seems that F-Secure uses some form of packed binaries for their updates that is often used by malware.

To allow the updates to go through without disabling the rules altogether, we can use suppressions. All the alerts happened in streams talking to IP addresses in the range. Whois lookup suggested that F-Secure has available, so I decided to suppress the rules for that entire block.

To add the suppressions, I added the following lines to my threshold.conf:

# f-secure update matching
suppress gen_id 1, sig_id 2009557, track by_src, ip
suppress gen_id 1, sig_id 2012086, track by_src, ip
suppress gen_id 1, sig_id 2003614, track by_src, ip

After a Suricata restart, the updates now work fine. If you run Suricata in IDS mode you may still want to add the suppressions to reduce the number of alerts.

File extraction in Suricata

Today I pushed out a new feature in Suricata I’m very excited about. It has been long in the making and with over 6000 new lines of code it’s a significant effort. It’s available in the current git master. I’d consider it alpha quality, so handle with care.

So what is this all about? Simply put, we can now extract files from HTTP streams in Suricata. Both uploads and downloads. Fully controlled by the rule language. But thats not all. I’ve added a touch of magic. By utilizing libmagic (this powers the “file” command), we know the file type of files as well. Lots of interesting stuff that can be done there.

Rule keywords

Four new rule keywords were added: filename, fileext, filemagic and filestore.

Filename and fileext are pretty trivial: match on the full name or file extension of a file.

alert http any any -> any any (filename:”secret.xls”;)
alert http any any -> any any (fileext:”pdf”;)

More interesting is the filemagic keyword. It runs on the magic output of inspecting the (start of) a file. This value is for example:

GIF image data, version 89a, 1 x 1
PE32 executable for MS Windows (GUI) Intel 80386 32-bit
HTML document text
Macromedia Flash data (compressed), version 9
MS Windows icon resource – 2 icons, 16×16, 256-colors
PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced
JPEG image data, JFIF standard 1.01
PDF document, version 1.6

So how the filemagic keyword allows you to match on this is pretty simple:

alert http any any -> any any (filemagic:”PDF document”;)
alert http any any -> any any (filemagic:”PDF document, version 1.6″;)

Pretty cool, eh? You can match both very specifically and loosely. For example:

alert http any any -> any any (filemagic:”executable for MS Windows”;)

Will match on (among others) these types:

PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
PE32 executable for MS Windows (GUI) Intel 80386 32-bit
PE32+ executable for MS Windows (GUI) Mono/.Net assembly

Finally there is the filestore keyword. It is the simplest of all: if the rule matches, the files will be written to disk.

Naturally you can combine the file keywords with the regular HTTP keywords, limiting to POST’s for example:

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload claimed, but not pdf”; flow:established,to_server; content:”POST”; http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1; rev:1;)

This will alert on and store all files that are uploaded using a POST request that have a filename extension of pdf, but the actual file is not pdf.


The storage to disk is handled by a new output module called “file”. It’s config looks like this:

enabled: yes # set to yes to enable
log-dir: files # directory to store the files
force-magic: no # force logging magic on all stored files

It needs to be enabled for file storing to work.

The files are stored to disk as “file.1″, “file.2″, etc. For each of the files a meta file is created containing the flow information, file name, size, etc. Example:

TIME: 01/27/2010-17:41:11.579196
PCAP PKT NUM: 2847035
DST PORT: 56207
FILENAME: /msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe
MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly
SIZE: 5204


The file extraction is for HTTP only currently, and works on top of our HTTP parser. As the HTTP parser runs on top of the stream reassembly engine, configuration parameters of both these parts of Suricata affect handling of files.

The stream engine option “stream.reassembly.depth” (default 1 Mb) controls the depth into a stream in which we look. Set to 0 for no limit.
The libhtp options request-body-limit and response-body-limit control how far into a HTTP request or response body we look. Again set to 0 for no limit. This can be controlled per HTTP server.


The file handling is fully streaming, so it’s very efficient. Nonetheless there will be an overhead for the extra parsing, book keeping, writing to disk, etc. Memory requirements appear to be limited as well. Suricata shouldn’t keep more than a few kb per flow in memory.


Lack of limits is a limitation. For file storage no limits have been implemented yet. So it’s easy to clutter your disk up with files. Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000 files. Better use a separate partition if you’re on a life link.

Future work

Apart from stabilizing this code and performance optimizing it, the next step will be SMTP file extraction. Possibly other protocols, although nothing is set in stone there yet.