Suricata bits, ints and vars

Since the beginning of the project we’ve spoken about variables on multiple levels. Of course flowbits defined by the Snort language came first, but other flow based variables quickly followed: flowints for basic counting, and vars for extracting data using pcre expressions.

I’ve always thought of the pcre data extraction using substring capture as a potentially powerful feature. However the implementation was lacking. The extracted data couldn’t really be used for much.

Internals

Recently I’ve started work to address this. The first thing that needed to be done was to move the mapping between variable names, such as flowbit names, and the internal id’s out of the detection engine. The detection engine is not available in the logging engines and logging of the variables was one of my goals.

This is a bit tricky as we want a lock less data structure to avoid runtime slow downs. However rule reloads need to be able to update it. The solution I’ve created has a read only look up structure after initialization that is ‘hot swapped’ with the new data at reload.

PCRE

The second part of the work is to allow for more flexible substring capture. There are 2 limitations in the current code: first, only single substring can be captured per rule. Second, the names of the variables were limited by libpcre. 32 chars with hardly any special chars like dots. The way to express these names has been a bit of a struggle.

The old way looks like this:

pcre:"/(?P.*)<somename>/";

This create a flow based variable named ‘somename’ that is filled by this pcre expression. The ‘flow_’ prefix can be replaced by ‘pkt_’ to create a packet based variable.

In the new method the names are no longer inside the regular expression, but they come after the options:

pcre:"/([a-z]+)\/[a-z]+\/(.+)\/(.+)\/changelog$/GUR, \
    flow:ua/ubuntu/repo,flow:ua/ubuntu/pkg/base,     \
    flow:ua/ubuntu/pkg/version";

After the regular pcre regex and options, a comma separated lists of variable names. The prefix here is ‘flow:’ or ‘pkt:’ and the names can contain special characters now. The names map to the capturing substring expressions in order.

Key/Value

While developing this a logical next step became extraction of key/value pairs. One capture would be the key, the second the value. The notation is similar to the last:

pcre:"^/([A-Z]+) (.*)\r\n/G, pkt:key,pkt:value";

‘key’ and ‘value’ are simply hardcoded names to trigger the key/value extraction.

Logging

Things start to get interesting when logging is added. First, by logging flowbits existing rulesets can benefit.

{
  "timestamp": "2009-11-24T06:53:35.727353+0100",
  "flow_id": 1192299914258951,
  "event_type": "alert",
  "src_ip": "69.49.226.198",
  "src_port": 80,
  "dest_ip": "192.168.1.48",
  "dest_port": 1077,
  "proto": "TCP",
  "tx_id": 0,
  "alert": {
    "action": "allowed",
    "gid": 1,
    "signature_id": 2018959,
    "rev": 2,
    "signature": "ET POLICY PE EXE or DLL Windows file download HTTP",
    "category": "Potential Corporate Privacy Violation",
    "severity": 1
  },
  "http": {
    "hostname": "69.49.226.198",
    "url": "/x.exe",
    "http_user_agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)",
    "http_content_type": "application/octet-stream",
    "http_method": "GET",
    "protocol": "HTTP/1.1",
    "status": 200,
    "length": 23040
  },
  "vars": {
    "flowbits": {
      "exe.no.referer": true,
      "http.dottedquadhost": true,
      "ET.http.binary": true
    }
  }
}

When rules are created to extract info and set specific ‘information’ flowbits, logging can create value:

"vars": {
  "flowbits": {
    "port/http": true,
    "ua/os/windows": true,
    "ua/tool/msie": true
  },
  "flowvars": {
    "ua/tool/msie/version": "6.0",
    "ua/os/windows/version": "5.1"
  }
}
"http": {
  "hostname": "db.local.clamav.net",
  "url": "/daily-15405.cdiff",
  "http_user_agent": "ClamAV/0.97.5 (OS: linux-gnu, ARCH: x86_64, CPU: x86_64)",
  "http_content_type": "application/octet-stream",
  "http_method": "GET",
  "protocol": "HTTP/1.0",
  "status": 200,
  "length": 1688
},
"vars": {
  "flowbits": {
    "port/http": true,
    "ua/os/linux": true,
    "ua/arch/x64": true,
    "ua/tool/clamav": true
  },
  "flowvars": {
     "ua/tool/clamav/version": "0.97.5"
  }
}

In the current code the alert and http logs are showing the ‘vars’.

Next to this, a ‘eve.vars’ log is added, which is a specific output of vars independent of other logs.

Use cases

Some of the use cases could be to add more information to logs without having to add code. For example, I have a set of rules that set of rules that extracts the packages are installed by apt-get or for which Ubuntu’s updater gets change logs:

"vars": {
  "flowbits": {
    "port/http": true,
    "ua/tech/python/urllib": true
  },
  "flowvars": {
    "ua/tech/python/urllib/version": "2.7",
    "ua/ubuntu/repo": "main",
    "ua/ubuntu/pkg/base": "libxml2",
    "ua/ubuntu/pkg/version": "libxml2_2.7.8.dfsg-5.1ubuntu4.2"
  }
}

It could even be used as a simple way to ‘parse’ protocols and create logging for them.

Performance

Using rules to extract data from traffic is not going to be cheap for 2 reasons. First, Suricata’s performance mostly comes from avoiding inspecting rules. It has a lot of tricks to make sure as little rules as possible are evaluated. Likewise, the rule writers work hard to make sure their rules are only evaluated if they have a good chance of matching.

The rules that extract data from user agents or URI’s are going to be matching very often. So even if the rules are written to be efficient they will still be evaluated a lot.

Secondly, extraction currently can be done through PCRE and through Lua scripts. Neither of which are very fast.

Testing the code

Check out this branch https://github.com/inliniac/suricata/pull/2468 or it’s replacements.

Bonus: unix socket hostbits

Now that variable names can exist outside of the detection engine, it’s also possible to add unix socket commands that modify them. I created this for ‘hostbits’. The idea here is to simply use hostbits to implement white/blacklists. A set of unix socket commands will be added to manage add/remove them. The existing hostbits implementation handles expiration and matching.

To block on the blacklist:

drop ip any any -> any any (hostbits:isset,blacklist; sid:1;)

To pass all traffic on the whitelist:

pass ip any any -> any any (hostbits:isset,whitelist; sid:2;)

Both rules are ‘ip-only’ compatible, so will be efficient.

A major advantage of this approach is that the black/whitelists can be
modified from ruleset themselves, just like any hostbit.

E.g.:

alert tcp any any -> any any (content:"EVIL"; \
    hostbits:set,blacklist; sid:3;)

A new ‘list’ can be created this way by simply creating a rule that
references a hostbit name.

Unix Commands

Unix socket commands to add and remove hostbits need to be added.

Add:

suricatasc -c "add-hostbit <ip> <hostbit> <expire>"
suricatasc -c "add-hostbit 1.2.3.4 blacklist 3600"

If an hostbit is added for an existing hostbit, it’s expiry timer is updated.

Hostbits expire after the expiration timer passes. They can also be manually removed.

Remove:

suricatasc -c "remove-hostbit <ip> <hostbit>"
suricatasc -c "remove-hostbit 1.2.3.4 blacklist"

Feedback & Future work

I’m looking forward to getting some feedback on a couple of things:

  • log output structure and logic. The output needs to be parseable by things like ELK, Splunk and jq.
  • pcre options notation
  • general feedback about how it runs

Some things I’ll probably add:

  • storing extracted data into hosts, ippairs
  • more logging

Some other ideas:

  • extraction using a dedicated keyword, so outside of pcre
  • ‘int’ extraction

Let me know what you think!

Suricata 1.4 is out

About 5 months after 1.3 came out we’ve released 1.4, and we’ve been quite busy. Eric Leblond’s post here has all the stats and graphs. There are three big new features: unix socket, ip reputation and luajit. For each of these the same is true: it’s usesable now, but it’s the potential that we’re most excited about. Over the next months we’ll be extending each of those to be even more useful. We’re very much interested in ideas and feedback.

Performance obviously matters to many in the IDS world, and here too we have improved Suricata quite a bit again. We now have Suricata 1.4 running on a ISP 10gbit/s network on commodity hardware with a large ET ruleset. Of course, YMMV, but we’re definitely making a lot of progress here.

Sometimes the little things matter a lot as well. A minor new feature is that live “drop” stats are the the stats.log now:

capture.kernel_packets    | AFPacketem21              | 13640581
capture.kernel_drops      | AFPacketem21              | 442864
capture.kernel_packets    | AFPacketem22              | 7073228
capture.kernel_drops      | AFPacketem22              | 9449
capture.kernel_packets    | AFPacketem23              | 10528970
capture.kernel_drops      | AFPacketem23              | 148281
capture.kernel_packets    | AFPacketem24              | 7212584
capture.kernel_drops      | AFPacketem24              | 12643
capture.kernel_packets    | AFPacketem25              | 9763439
capture.kernel_drops      | AFPacketem25              | 17874
capture.kernel_packets    | AFPacketem26              | 10464106
capture.kernel_drops      | AFPacketem26              | 20378
capture.kernel_packets    | AFPacketem27              | 8869182
capture.kernel_drops      | AFPacketem27              | 18336
capture.kernel_packets    | AFPacketem28              | 7925045
capture.kernel_drops      | AFPacketem28              | 258168

This is supported for AF_PACKET, PF_RING and libpcap.

Last August we’ve added Suricata to github to make it easier to participate. Also, the code review tools associated with the pull requests are very useful. Github has been an unexpected success for us. At the time of writing there are 24 forks of Suricata on it, I’ve processed about 250 pull requests. The patches that have been submitted range from small fixes to full blown features, and more are on the way. I’m very grateful for these contributions and everyone’s patience with me.

Now that 1.4 is out, we’ll be taking it slow over the holidays. The team has been working like crazy, and everyone deserves a break. So the next weeks we’ll focus on further consolidation, fixing bugs that no doubt will pop up. Other than that, things will be slow. After the holidays we’ll start planning for the next milestone. Again, your ideas and contributions are very welcome! 🙂

Closing in on Suricata 1.4

I just made Suricata 1.4rc1 available with some pretty exciting features: unix socket mode and IP reputation.

Unix socket

First of all, Eric Leblond’s work on the Unix socket was merged. The unix socket work consists of two parts. The unix socket protocol implementation and a new runmode.

The protocol implementation is based on JSON messages over unix socket. Eric will be fully documenting it soon. Currently the commands are limited to shutting down and getting some basic stats. This part isn’t very exciting yet, but the groundwork for many future extensions has been laid.

The part that is exciting right now, is the unix socket runmode. That this does is start Suricata with all the rules and such, and then it waits for commands on the unix socket. Then the commands will be a pcap filename – log directory pair. This pcap will then be inspected against the rules and the logs go into the log directory supplied. As this can be easily scripted (a python script is provided), it’s a very fast way to test your pcap collections, as the overhead of starting and stopping is skipped.

This may initialy appeal mostly for those of you doing sandnetting and malware analysis, where tens of thousands of pcaps and automatically processed every hour or day, I think this could grow into a feature for a wider audience as well. For example, I could see use in Sguil or Snorby, or pretty much every event manager with full packet capture support, adding an option to scan a pcap associated with an event again. Maybe against _all_ rules, instead of the tuned set running on the live sensors. Maybe you can re-inspect old sessions against the current rules this way to find hits on attacks that were 0-days at the time, etc.

I think there could be many possibilities.

IP Reputation

A slightly more polished version of the code I discussed here is now available in this release. It’s one of those things where it will be very interesting to see how people will put it to use.

Matt Jonkman just wrote some of his ideas to the Emerging Threats mailing list: one of the ideas Matt wrote about is to amend weak rules with reputation data. So if you have a signature that is phrone to false positives, you probably disable it currently. But what if you combine it with reputation data? If the weak rule fires on a sketchy ip, it may be a more reliable alert.

We’ll see how this plays out.

1.4 final

We’re hoping that if nothing big happens, we can do a mid-December 1.4 final release. So please consider running this new release. It’s running very stable on quite a number of places, ISP networks, Lab networks, home networks, sandnetting networks, etc. But we need much more testing to find issues and/or gain confidence that we have found the most important issues. Thanks for helping out!