For a few months Suricata has been able to calculate the MD5 checksum of files it sees in HTTP streams. Regardless of extraction to disk, the MD5 could be calculated and logged. Martin Holste created a set of very cool scripts to use the logged MD5 to look it up at VirusTotal and some other similar services. This is done outside of Suricata. One thing I have been wanting to try is matching against these MD5′s in Suricata itself.
In the recent 1.3beta2 release, I’ve added a first attempt at this. The current support is crude but works. I’ve added a rule keyword, called “filemd5″.
The keyword opens the file “filename” from your rule directory and loads it’s content. It expects a heximal MD5 per line:
Any extra info on a line is ignored, so the output of md5sum can be used safely:
At start up, Suricata will tell you how much memory the hash table uses. The hash table is quite compact. It uses hash_rows * 4 + md5′s * 16 bytes. For 20155064 MD5′s it uses a bit more than 300mb:
 9/6/2012 -- 08:40:44 - (detect-filemd5.c:264) (DetectFileMd5Parse) -- MD5 hash size 324578208 bytes
Performance so far seems to be great. I’ve been testing with 20 million MD5′s and so far I’m not seeing any significant performance impact. The dedicated data structures I created for it seem to hold up quite nicely. Right now the only slow down I see is at start up, where it adds a few seconds. The data structure is currently limited to 32 bit, so a 4GB table. This should allow ~250 million MD5′s, although I haven’t tested that.
As this is a regular rule keyword, it can be combined with other rule keywords, such as filemagic or filename. A sig like “filemagic:pdf; filemd5:bad_pdfs;” would match the list “bad_pdfs” only against pdf files.
I think there are several possible use cases for this new functionality. First, I could imagine a project like Emerging Threats shipping a list of the most recent malware MD5′s. It should be possible to distribute the most recent 100k or so MD5′s.
Second, this could be used as a poor man’s DLP. Hash the files you don’t want to see on your network outbound or unencrypted and have Suricata look for them.
The most interesting use case probably is not implemented yet, but will be. When negated matching is implemented, the filemd5 keyword could be used for white listing.
As Martin Holste tweeted: “Awesome, more than enough to handle all Windows OS files. So can we just do: filemagic:exe; filemd5:!whitelist.txt;?”
I think an alert for all executable downloads that are not “pre approved” is definitely something that can be useful.
Again, the work continues!