Suricata profiling per keyword

Last week I’ve added some more profiling options to Suricata. It’s part of the current git master. It’s enabled only when --enable-profiling and then through the suricata.yaml:

profiling:
  # per keyword profiling
  keywords:
    enabled: yes
    filename: keyword_perf.log
    append: yes

This will output a table similar to below:

--------------------------------------------------------------------------
Date: 11/7/2013 -- 15:13:11
--------------------------------------------------------------------------
Stats for: total
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
threshold        355324491   190574   409      72276       1864.00     3625.00     1860.00    
content          1274592063  534328   196738   312321      2385.00     2424.00     2362.00    
pcre             56626031    11149    824      254562      5079.00     12234.00    4507.00    
byte_test        153287955   128254   32109    67989       1195.00     1658.00     1040.00    
byte_jump        3676404     2041     2041     15939       1801.00     1801.00     0.00       
flow             38276182    22842    22842    63987       1675.00     1675.00     0.00       
isdataat         580764      558      556      2427        1040.00     1040.00     1017.00    
dsize            2212029     2062     2061     3711        1072.00     1072.00     789.00     
flowbits         1677209     874      870      9873        1919.00     1923.00     884.00     
itype            1653        2        1        1386        826.00      267.00      1386.00    
icode            27383781    93827    2        25545       291.00      1021.00     291.00     
flags            192751968   245519   189709   255639      785.00      753.00      892.00     
urilen           6149297     6142     1099     28299       1001.00     1395.00     915.00     
byte_extract     143091      78       78       7743        1834.00     1834.00     0.00       
--------------------------------------------------------------------------
Stats for: packet
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
flow             38276182    22842    22842    63987       1675.00     1675.00     0.00       
dsize            2212029     2062     2061     3711        1072.00     1072.00     789.00     
flowbits         351171      294      290      5526        1194.00     1198.00     884.00     
itype            1653        2        1        1386        826.00      267.00      1386.00    
icode            27383781    93827    2        25545       291.00      1021.00     291.00     
flags            192751968   245519   189709   255639      785.00      753.00      892.00     
--------------------------------------------------------------------------
Stats for: packet/stream payload
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          1203990910  512902   183628   312321      2347.00     2365.00     2337.00    
pcre             28087301    6598     54       254562      4256.00     12279.00    4190.00    
byte_test        153287955   128254   32109    67989       1195.00     1658.00     1040.00    
byte_jump        3676404     2041     2041     15939       1801.00     1801.00     0.00       
isdataat         578172      556      554      2427        1039.00     1039.00     1017.00    
byte_extract     143091      78       78       7743        1834.00     1834.00     0.00       
--------------------------------------------------------------------------
Stats for: http uri
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          44775802    13102    8351     60993       3417.00     3257.00     3698.00    
pcre             18284421    3646     97       61338       5014.00     8916.00     4908.00    
isdataat         2592        2        2        1725        1296.00     1296.00     0.00       
urilen           6149297     6142     1099     28299       1001.00     1395.00     915.00     
--------------------------------------------------------------------------
Stats for: http raw uri
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
pcre             9534        2        0        4953        4767.00     0.00        4767.00    
--------------------------------------------------------------------------
Stats for: http client body
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          1556904     441      181      58476       3530.00     2874.00     3986.00    
pcre             63924       6        6        17358       10654.00    10654.00    0.00       
--------------------------------------------------------------------------
Stats for: http headers
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          23688244    7631     4348     31098       3104.00     3311.00     2829.00    
pcre             9998970     859      667      71904       11640.00    12727.00    7862.00    
--------------------------------------------------------------------------
Stats for: http stat code
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          80052       39       20       3699        2052.00     2199.00     1898.00    
--------------------------------------------------------------------------
Stats for: http method
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          476334      203      201      27240       2346.00     2351.00     1846.00    
--------------------------------------------------------------------------
Stats for: http cookie
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
content          23817       10       9        2763        2381.00     2384.00     2358.00    
pcre             181881      38       0        13095       4786.00     0.00        4786.00    
--------------------------------------------------------------------------
Stats for: post-match
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
flowbits         1326038     580      580      9873        2286.00     2286.00     0.00       
--------------------------------------------------------------------------
Stats for: threshold
--------------------------------------------------------------------------
Keyword          Ticks       Checks   Matches  Max Ticks   Avg         Avg Match   Avg No Match
---------------- ----------- -------- -------- ----------- ----------- ----------- ----------- 
threshold        355324491   190574   409      72276       1864.00     3625.00     1860.00

The first part has the totals for all keywords. After this the stats are broken down per buffer type.

Part of this work was sponsored by Emerging Threats.

More on Suricata lua flowints

This morning I added flowint lua functions for incrementing and decrementing flowints. From the commit:

Add flowint lua functions for incrementing and decrementing flowints.

First use creates the var and inits to 0. So a call:

    a = ScFlowintIncr(0)

Results in a == 1.

If the var reached UINT_MAX (2^32), it’s not further incremented. If the
var reaches 0 it’s not decremented further.

Calling ScFlowintDecr on a uninitialized var will init it to 0.

Example script:

    function init (args)
        local needs = {}
        needs["http.request_headers"] = tostring(true)
        needs["flowint"] = {"cnt_incr"}
        return needs
    end

    function match(args)
        a = ScFlowintIncr(0);
        if a == 23 then
            return 1
        end

        return 0
    end
    return 0

This script matches the 23rd time it’s invoked on a flow.

Compared to yesterday’s flowint script and the earlier flowvar based counting script, this performs better:

   Num      Rule         Gid      Rev      Ticks        %      Checks   Matches  Max Ticks   Avg Ticks   Avg Match   Avg No Match
  -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 
  1        1            1        0        2434188332   59.71  82249    795      711777      29595.35    7683.20     29809.22   
  2        2            1        0        1015328580   24.91  82249    795      154398      12344.57    3768.66     12428.27   
  3        3            1        0        626858067    15.38  82249    795      160731      7621.47     3439.91     7662.28    

The rules:

alert http any any -> any any (msg:"LUAJIT HTTP flowvar match"; luajit:lua_flowvar_cnt.lua; flow:to_server; sid:1;)
alert http any any -> any any (msg:"LUAJIT HTTP flowint match"; luajit:lua_flowint_cnt.lua; flow:to_server; sid:2;)
alert http any any -> any any (msg:"LUAJIT HTTP flowint incr match"; luajit:lua_flowint_incr_cnt.lua; flow:to_server; sid:3;)

Please comment, discuss, review etc on the oisf-devel list.

Suricata Lua scripting flowint access

A few days ago I wrote about my Emerging Threats sponsored work to support flowvars from Lua scripts in Suricata.

Today, I updated that support. Flowvar ‘sets’ are now real time. This was needed to fix some issues where a script was invoked multiple times in single rule, which can happen with some buffers, like HTTP headers.

Also, I implemented flowint support. Flowints in Suricata are integers stored in the flow context.

Example script:

function init (args)
    local needs = {}
    needs["http.request_headers"] = tostring(true)
    needs["flowint"] = {"cnt"}
    return needs
end

function match(args)
    a = ScFlowintGet(0);
    if a then
        ScFlowintSet(0, a + 1)
    else
        ScFlowintSet(0, 1)
    end 
        
    a = ScFlowintGet(0);
    if a == 23 then
        return 1
    end 
    
    return 0
end 

return 0

It does the same thing as this flowvar script:

function init (args)
    local needs = {}
    needs["http.request_headers"] = tostring(true)
    needs["flowvar"] = {"cnt"}
    return needs
end

function match(args)
    a = ScFlowvarGet(0);
    if a then
        a = tostring(tonumber(a)+1)
        ScFlowvarSet(0, a, #a)
    else
        a = tostring(1)
        ScFlowvarSet(0, a, #a)
    end 
    
    if tonumber(a) == 23 then
        return 1
    end
    
    return 0
end

return 0

Only, at about half the cost:

   Num      Rule         Gid      Rev      Ticks        %      Checks   Matches  Max Ticks   Avg Ticks   Avg Match   Avg No Match
  -------- ------------ -------- -------- ------------ ------ -------- -------- ----------- ----------- ----------- -------------- 
  1        1            1        0        2392221879   70.56  82249    795      834993      29085.12    6964.14     29301.02   
  2        2            1        0        998297994    29.44  82249    795      483810      12137.51    4019.44     12216.74   

Suricata Lua scripting flowvar access

Funded by Emerging Threats, I’ve been working on giving the lua scripts access to flowvars.

Currently only “flowvars” are done, “flowints” will be next. Please review the code at:
https://github.com/inliniac/suricata/tree/dev-lua-flowvar

Pcre based flowvar capturing is done in a post-match fashion. If the rule containing the “capture” matches, the var is stored in the flow.

For lua scripting, this wasn’t what the rule writers wanted. In this case, the flowvars are stored in the flow regardless of a rule match.

The way a script can start using flowvars is by first registering which one it needs access to:

function init (args)
    local needs = {}
    needs["http.request_headers.raw"] = tostring(true)
    needs["flowvar"] = {"cnt"}
    return needs
end

More than one can be registered, e.g.:

    needs["flowvar"] = {"cnt", "somevar", "anothervar" }

The maximum is 15 per script. The order of the vars matters. As Suricata uses id’s internally, to use the vars you have to use id’s as well. The first registered var has id 0, 2nd 1 and so on:

function match(args)
    a = ScFlowvarGet(0);
    if a then
        print ("We have an A: " .. (a))
        a = tostring(tonumber(a)+1)
        print ("A incremented to: " .. (a))
        ScFlowvarSet(0, a, #a)
    else
        print "Init A to 1"
        a = tostring(1)
        ScFlowvarSet(0, a, #a)
    end

    print ("A is " .. (a))
    if tonumber(a) == 23 then
        print "Match!"
        return 1
    end

    return 0
end

You can also use a var:

function init (args)
    local needs = {}
    needs["http.request_headers.raw"] = tostring(true)
    needs["flowvar"] = {"blah", "cnt"}
    return needs
end

local var_cnt = 1

function match(args)
    a = ScFlowvarGet(var_cnt);
    if a then
        print ("We have an A: " .. (a))
        a = tostring(tonumber(a)+1)
        print ("A incremented to: " .. (a))
        ScFlowvarSet(var_cnt, a, #a)
    else
        print "Init A to 1"
        a = tostring(1)
        ScFlowvarSet(var_cnt, a, #a)
    end

    print ("A is " .. (a))
    if tonumber(a) == 23 then
        print "Match!"
        return 1
    end

    return 0
end

Flowvars are set at the end of the rule’s inspection, so after the script has run.

When multiple stores are done from the script and/or pcre, the last match will win. So if order matters, rule priority can be used to control inspection order.

Thoughts, comments, and code review highly welcomed at the oisf-devel list.

Closing in on Suricata 1.4

I just made Suricata 1.4rc1 available with some pretty exciting features: unix socket mode and IP reputation.

Unix socket

First of all, Eric Leblond’s work on the Unix socket was merged. The unix socket work consists of two parts. The unix socket protocol implementation and a new runmode.

The protocol implementation is based on JSON messages over unix socket. Eric will be fully documenting it soon. Currently the commands are limited to shutting down and getting some basic stats. This part isn’t very exciting yet, but the groundwork for many future extensions has been laid.

The part that is exciting right now, is the unix socket runmode. That this does is start Suricata with all the rules and such, and then it waits for commands on the unix socket. Then the commands will be a pcap filename – log directory pair. This pcap will then be inspected against the rules and the logs go into the log directory supplied. As this can be easily scripted (a python script is provided), it’s a very fast way to test your pcap collections, as the overhead of starting and stopping is skipped.

This may initialy appeal mostly for those of you doing sandnetting and malware analysis, where tens of thousands of pcaps and automatically processed every hour or day, I think this could grow into a feature for a wider audience as well. For example, I could see use in Sguil or Snorby, or pretty much every event manager with full packet capture support, adding an option to scan a pcap associated with an event again. Maybe against _all_ rules, instead of the tuned set running on the live sensors. Maybe you can re-inspect old sessions against the current rules this way to find hits on attacks that were 0-days at the time, etc.

I think there could be many possibilities.

IP Reputation

A slightly more polished version of the code I discussed here is now available in this release. It’s one of those things where it will be very interesting to see how people will put it to use.

Matt Jonkman just wrote some of his ideas to the Emerging Threats mailing list: one of the ideas Matt wrote about is to amend weak rules with reputation data. So if you have a signature that is phrone to false positives, you probably disable it currently. But what if you combine it with reputation data? If the weak rule fires on a sketchy ip, it may be a more reliable alert.

We’ll see how this plays out.

1.4 final

We’re hoping that if nothing big happens, we can do a mid-December 1.4 final release. So please consider running this new release. It’s running very stable on quite a number of places, ISP networks, Lab networks, home networks, sandnetting networks, etc. But we need much more testing to find issues and/or gain confidence that we have found the most important issues. Thanks for helping out!

IP Reputation in Suricata

Disclaimer: this work was sponsored by Emerging Threats Pro.

One thing we’ve been talking about for many years at OISF is IP Reputation. The basic idea is that many organizations have information about specific IP-addresses. This information may be that a host is infected, acts as a spam relay or many other things. We’ve always thought it might be useful to apply this info to the IDS directly.

In the last weeks I’ve developed code to load IP reputation information into Suricata. This code is now part of the Suricata git master, so it’s available to all.

The work consisted of 3 main parts: data load, internal data structures and a rule keyword.

Data loading

The data I worked with was provided by Emerging Threats Pro. The data format is very simple. Two types of CSV files, one to define a mapping between category names and id’s and the other to define the scores for hosts in the categories.

The data formats are documented here: IP Reputation Format.

Internal Data Structures

To store the data in memory I hooked into our “Hosts” API. The Hosts API is a hash table like the Flow table that can be used to store data per host. It’s in use for Tagging and Thresholding. I added storage for IP Reputation to it.

Rule keyword

A new rule keyword to match on the reputation data was introduced: “iprep”. The keyword allows a rule to match on a specific category. Example:

alert ... (flow:to_server; iprep:src,Bot,>,10;)

This will generate an alert if the SRC IP of the host talking to a server is known to have a score of >10 in the “Bot” category.

The keyword is compatible to Suricata’s concept of “IP-only” rules. These are rules that do not inspect packet content or flow state and can thus be inspected once per flow direction instead of for each packet.

Speed

I’ve been playing with data sets of up to a million entries. Loading it takes hardly any time and I’m confident larger numbers will work just fine. The host table just needs bigger memcaps and hash sizes.

At runtime, the speed depends mostly on the rules. A pure “iprep” rule is quite expensive when not IP-only, although this is mostly due to the frequency of the checks. Such rules will be checked against large numbers of packets.

When created as a IP-only rule, things change. Such rules are checked only once per flow direction, so overhead appears to be minimal in this case.

Data

The data I used from Emerging Threats Pro is not available for free, so for those who want to test creating your own data is required right now. Matt Jonkman from Emerging Threats Pro will make a free feed available within a few weeks though. Of course you could also get the paid data from Emerging Threats Pro. 🙂

Update 29/11/2012

This feature is part of the just released 1.4rc1 version, please help us test it!

Suricata http_user_agent vs http_header

One of the new features in Suricata 1.3 is a new content modifier called http_user_agent. This allows rule writers to match on the User-Agent header in HTTP requests more efficiently. The new keyword is documented in the OISF wiki. In this post, I’ll show it’s efficiency with two examples.

Example 1: rarely matching UA

Consider a signature where the match if on a part of the UA that is very rare, so not part of regular User Agents. In my example “abc”.

The signature looks like this:
alert http any any -> any any (msg:"User-Agent abc http_header"; content:"User-Agent: "; http_header; nocase; content:"abc"; http_header; distance:0; pcre:"/User-Agent:[^\n]*abc/iH"; sid:1; rev:1;)

The http_user_agent variant looks much simpler:
alert http any any -> any any (msg:"User-Agent abc http_user_agent"; content:"abc"; http_user_agent; sid:2; rev:1;)

Now when running this against a pcap with over 12.500 HTTP requests, neither signature matched. However, signature 1 was inspected 209752 times! This high number is because the request headers are inspected one-by-one. Signature 2 wasn’t inspected at all, as it never made it past the multi pattern matching stage (mpm).

When looking at pcap runtime, running with only the http_user_agent version is about 10% faster.

Example 2: commonly matching UA

So, what if we want to match on something that is quite common? In other words, the signature will have frequent matches?

First, the http_header signature:
alert http any any -> any any (msg:"User-Agent MSIE 6 http_header"; content:"User-Agent: "; http_header; nocase; content:"MSIE 6"; http_header; distance:0; pcre:"/User-Agent:[^\n]*MSIE 6/iH"; sid:3; rev:1;)
The http_user_agent variant:
alert http any any -> any any (msg:"User-Agent MSIE 6 http_user_agent"; content:"MSIE 6"; http_user_agent; sid:4; rev:1;)

In this case both signatures do match, just over 10.000 times even. The stats look like this:

Each of the inspections of signature 4, the http_user_agent variant, is actually a match. This makes sense as we look for a simple string and the mpm will only invoke the signature if that string is found. It’s clear that the http_header variant takes way more resources. Here too, when looking at pcap runtime, running with only the http_user_agent version is approximately 10% faster.

Final remarks

It’s quite clear that the http_user_agent keyword is much more efficient that inspecting all the HTTP headers. But other than efficiency, the http_user_agent also allows for much easier to read rules.

The Emerging Threats project will likely fork their Suricata ruleset for 1.3 (see this blog post). Even though this will be a significant effort on their side, it’s pretty clear to me the performance effect will be noticeable!

Suricata 1.3 released

Today, almost half a year after the last “stable” release, we released Suricata 1.3. I think this release is a big step forward with regard to maturity of Suricata. Performance and scalability have been much improved, just like accuracy and stability.

The official announcement can be found on the OISF site

In the last 6 months a lot of code has been changed:

384 files changed, 44332 insertions(+), 18478 deletions(-)

These changes have been made by 11 committers, only four of which were paid by OISF. The others were either developers from supporting vendors or great community members. I’d like to thank everyone for their contribution!

With the 1.3 release, for some people work only just started. I think this would be an ideal time for the Emerging Threats project to fork their Suricata ruleset. The new set for 1.3 could then start taking advantage of features like http_user_agent, file_data, file keywords, tls/ssl keywords, etc. One of the new features in 1.3, the rule analyzer, should be really helpful for the rule writer folks.

Looking towards the future, we’re planning for some nice new features and improvement. First, the TLS/SSL handling will be further improved. The guys are working on certificate fingerprint matching, storing certs to disk and more. We’ll also continue to improve our IPv6 support. Of course, performance work is always on our agenda, so also for the time to come. See our roadmap here.

Finally, if you’re interested discussing the roadmap with us in person, the RAID 2012 conference in Amsterdam next fall is a good opportunity. Most of the team will be present.

Suricata 1.1 beta 1 released

Today we’ve released Suricata 1.1 beta 1, the first beta of the upcoming Suricata 1.1 release. The official release announcement is here on the OISF website.

The main focus of the new release has been to improve performance and to add support to the features the new ET/ETpro ruleset needs. ET and ETpro have rulesets specially tuned and geared for Suricata. We’re still missing some new rule keywords that are used by VRT, so in the 1.1 beta 2 release we’ll address that.

Other than that, I got quite a few patches waiting. We’ll be improving stream reassembly, inline mode, prelude output, and numerous other things.

Like always, please give this a try and let us know how it works for you!

Setting up Suricata 0.9.0 for initial use on Ubuntu Lucid 10.04

The last few days I blogged about compiling Suricata in IDS and IPS mode. Today I’ll write about how to set it up for first use.

Starting with Suricata 0.9.0 the engine can run as an unprivileged user. For this create a new user called “suricata”.

useradd --no-create-home --shell /bin/false --user-group --comment “Suricata IDP account” suricata

This command will create a user and group called “suricata”. It will be unable to login as the shell is set to /bin/false.

The next thing to do is creating a configuration directory. Create /etc/suricata/ and copy the suricata.yaml example config into it. The example configuration can be found in the source archive you used to build Suricata:

mkdir /etc/suricata
cp /path/to/suricata-0.9.0/suricata.yaml /etc/suricata/
cp /path/to/suricata-0.9.0/classification.config /etc/suricata/

Next, create the log directory.

mkdir /var/log/suricata

The log directory needs to be writable for the user and group “suricata”, so change the ownership:

chown suricata:suricata /var/log/suricata

The last step I’ll be describing here is retrieving an initial ruleset. The 2 main rulesets you can use are Emerging Threats (ET) and Sourcefire’s VRT ruleset. Since putting VRT to use is a little bit more complicated I’ll be focussing on ET here.

First, download the emerging rules:

wget http://www.emergingthreats.net/rules/emerging.rules.tar.gz

Go to /etc/suricata/ and extract the rules archive:

cd /etc/suricata/
tar xzvf /path/to/emerging.rules.tar.gz

There is a lot more to rules, such as tuning and staying updated, but thats beyond the scope of this post.

Suricata is now ready to be started:

suricata -c /etc/suricata/suricata.yaml -i eth0 --user suricata --group suricata

If all is setup properly, Suricata will tell you it is now running:

[2087] 9/5/2010 — 18:17:47 – (tm-threads.c:1362) (TmThreadWaitOnThreadInit) — all 8 packet processing threads, 3 management threads initialized, engine started.

There are 3 log files in /var/log/suricata that will be interesting to monitor:

– stats.log: displays statistics on packets, tcp sessions etc.
– fast.log: a alerts log similar to Snort’s fast log.
– http.log: displays HTTP requests in a Apache style format.

This should get you going. There is a lot more to deploying Suricata that I plan to blog on later.