Disclaimer: this work was sponsored by Emerging Threats Pro.
One thing we’ve been talking about for many years at OISF is IP Reputation. The basic idea is that many organizations have information about specific IP-addresses. This information may be that a host is infected, acts as a spam relay or many other things. We’ve always thought it might be useful to apply this info to the IDS directly.
In the last weeks I’ve developed code to load IP reputation information into Suricata. This code is now part of the Suricata git master, so it’s available to all.
The work consisted of 3 main parts: data load, internal data structures and a rule keyword.
The data I worked with was provided by Emerging Threats Pro. The data format is very simple. Two types of CSV files, one to define a mapping between category names and id’s and the other to define the scores for hosts in the categories.
The data formats are documented here: IP Reputation Format.
Internal Data Structures
To store the data in memory I hooked into our “Hosts” API. The Hosts API is a hash table like the Flow table that can be used to store data per host. It’s in use for Tagging and Thresholding. I added storage for IP Reputation to it.
A new rule keyword to match on the reputation data was introduced: “iprep”. The keyword allows a rule to match on a specific category. Example:
alert ... (flow:to_server; iprep:src,Bot,>,10;)
This will generate an alert if the SRC IP of the host talking to a server is known to have a score of >10 in the “Bot” category.
The keyword is compatible to Suricata’s concept of “IP-only” rules. These are rules that do not inspect packet content or flow state and can thus be inspected once per flow direction instead of for each packet.
I’ve been playing with data sets of up to a million entries. Loading it takes hardly any time and I’m confident larger numbers will work just fine. The host table just needs bigger memcaps and hash sizes.
At runtime, the speed depends mostly on the rules. A pure “iprep” rule is quite expensive when not IP-only, although this is mostly due to the frequency of the checks. Such rules will be checked against large numbers of packets.
When created as a IP-only rule, things change. Such rules are checked only once per flow direction, so overhead appears to be minimal in this case.
The data I used from Emerging Threats Pro is not available for free, so for those who want to test creating your own data is required right now. Matt Jonkman from Emerging Threats Pro will make a free feed available within a few weeks though. Of course you could also get the paid data from Emerging Threats Pro. :)
This feature is part of the just released 1.4rc1 version, please help us test it!