One of the new features in Suricata 1.3 is a new content modifier called http_user_agent. This allows rule writers to match on the User-Agent header in HTTP requests more efficiently. The new keyword is documented in the OISF wiki. In this post, I’ll show it’s efficiency with two examples.
Example 1: rarely matching UA
Consider a signature where the match if on a part of the UA that is very rare, so not part of regular User Agents. In my example “abc”.
The signature looks like this:
alert http any any -> any any (msg:"User-Agent abc http_header"; content:"User-Agent: "; http_header; nocase; content:"abc"; http_header; distance:0; pcre:"/User-Agent:[^\n]*abc/iH"; sid:1; rev:1;)
The http_user_agent variant looks much simpler:
alert http any any -> any any (msg:"User-Agent abc http_user_agent"; content:"abc"; http_user_agent; sid:2; rev:1;)
Now when running this against a pcap with over 12.500 HTTP requests, neither signature matched. However, signature 1 was inspected 209752 times! This high number is because the request headers are inspected one-by-one. Signature 2 wasn’t inspected at all, as it never made it past the multi pattern matching stage (mpm).
When looking at pcap runtime, running with only the http_user_agent version is about 10% faster.
Example 2: commonly matching UA
So, what if we want to match on something that is quite common? In other words, the signature will have frequent matches?
First, the http_header signature:
alert http any any -> any any (msg:"User-Agent MSIE 6 http_header"; content:"User-Agent: "; http_header; nocase; content:"MSIE 6"; http_header; distance:0; pcre:"/User-Agent:[^\n]*MSIE 6/iH"; sid:3; rev:1;)
The http_user_agent variant:
alert http any any -> any any (msg:"User-Agent MSIE 6 http_user_agent"; content:"MSIE 6"; http_user_agent; sid:4; rev:1;)
In this case both signatures do match, just over 10.000 times even. The stats look like this:
Each of the inspections of signature 4, the http_user_agent variant, is actually a match. This makes sense as we look for a simple string and the mpm will only invoke the signature if that string is found. It’s clear that the http_header variant takes way more resources. Here too, when looking at pcap runtime, running with only the http_user_agent version is approximately 10% faster.
It’s quite clear that the http_user_agent keyword is much more efficient that inspecting all the HTTP headers. But other than efficiency, the http_user_agent also allows for much easier to read rules.
The Emerging Threats project will likely fork their Suricata ruleset for 1.3 (see this blog post). Even though this will be a significant effort on their side, it’s pretty clear to me the performance effect will be noticeable!