SMTP file extraction in Suricata

In 2.1beta2 the long awaited SMTP file extraction support for Suricata finally appeared. It has been a long development cycle. Originally started by BAE Systems, it was picked up by Tom Decanio of FireEye Forensics Group (formerly nPulse Technologies) followed by a last round of changes from my side. But it’s here now.

It contains:

  • a MIME decoder
  • updates to the SMTP parser to use the MIME decoder for extracting files
  • SMTP JSON log, integrated with EVE
  • SMTP message URL extraction and logging

As it uses the Suricata file handling API, it shares almost everything with the existing file handling for HTTP. The rule keyword work and the various logs work automatically with SMTP as well.

Trying it out

To enable the file extraction, make sure that the MIME decoder is enabled:

      enabled: yes
      # Configure SMTP-MIME Decoder
        # Decode MIME messages from SMTP transactions
        # (may be resource intensive)
        # This field supercedes all others because it turns the entire
        # process on or off
        decode-mime: yes

        # Decode MIME entity bodies (ie. base64, quoted-printable, etc.)
        decode-base64: yes
        decode-quoted-printable: yes

        # Maximum bytes per header data value stored in the data structure
        # (default is 2000)
        header-value-depth: 2000

        # Extract URLs and save in state data structure
        extract-urls: yes

Like with HTTP, SMTP depends on the stream engine working correctly. So this page applies, although of course the HTTP specific settings are irrelevant to SMTP.

Troubleshooting (SMTP) file extraction issues should always start here:


Enabling the SMTP logging is simple, just add ‘smtp’ to the list of types in your EVE config, like so:

  # Extensible Event Format (nicknamed EVE) event log in JSON format
  - eve-log:
      enabled: yes
      filetype: regular #regular|syslog|unix_dgram|unix_stream
      filename: eve.json
      # the following are valid when type: syslog above
      #identity: "suricata"
      #facility: local5
      #level: Info ## possible levels: Emergency, Alert, Critical,
                   ## Error, Warning, Notice, Info, Debug
        - alert:
            # payload: yes           # enable dumping payload in Base64
            # payload-printable: yes # enable dumping payload in printable (lossy) format
            # packet: yes            # enable dumping of packet (without stream segments)
            # http: yes              # enable dumping of http fields
        - http:
            extended: yes     # enable this for extended logging information
            # custom allows additional http fields to be included in eve-log
            # the example below adds three additional fields when uncommented
            #custom: [Accept-Encoding, Accept-Language, Authorization]
        - dns
        - tls:
            extended: yes     # enable this for extended logging information
        - files:
            force-magic: no   # force logging magic on all logged files
            force-md5: no     # force logging of md5 checksums
        #- drop
        - smtp
        - ssh
        # bi-directional flows
        #- flow
        # uni-directional flows
        #- newflow


As a bonus, the MIME decoder also extracts URL’s from the SMTP message body (not attachments) and logs them in the SMTP log. This should make it easy to post process them. Currently only ‘HTTP’ URLS are extracted, starting with ‘http://‘. So HTTPS/FTP or URLs that don’t have the protocol prefix aren’t logged.


Naturally, if you’re using SMTP over TLS or have STARTTLS enabled, as you should at least on public networks, none of this will work.

Please help us test this feature!

File extraction in Suricata

Today I pushed out a new feature in Suricata I’m very excited about. It has been long in the making and with over 6000 new lines of code it’s a significant effort. It’s available in the current git master. I’d consider it alpha quality, so handle with care.

So what is this all about? Simply put, we can now extract files from HTTP streams in Suricata. Both uploads and downloads. Fully controlled by the rule language. But thats not all. I’ve added a touch of magic. By utilizing libmagic (this powers the “file” command), we know the file type of files as well. Lots of interesting stuff that can be done there.

Rule keywords

Four new rule keywords were added: filename, fileext, filemagic and filestore.

Filename and fileext are pretty trivial: match on the full name or file extension of a file.

alert http any any -> any any (filename:”secret.xls”;)
alert http any any -> any any (fileext:”pdf”;)

More interesting is the filemagic keyword. It runs on the magic output of inspecting the (start of) a file. This value is for example:

GIF image data, version 89a, 1 x 1
PE32 executable for MS Windows (GUI) Intel 80386 32-bit
HTML document text
Macromedia Flash data (compressed), version 9
MS Windows icon resource – 2 icons, 16×16, 256-colors
PNG image data, 70 x 53, 8-bit/color RGBA, non-interlaced
JPEG image data, JFIF standard 1.01
PDF document, version 1.6

So how the filemagic keyword allows you to match on this is pretty simple:

alert http any any -> any any (filemagic:”PDF document”;)
alert http any any -> any any (filemagic:”PDF document, version 1.6″;)

Pretty cool, eh? You can match both very specifically and loosely. For example:

alert http any any -> any any (filemagic:”executable for MS Windows”;)

Will match on (among others) these types:

PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit
PE32 executable for MS Windows (GUI) Intel 80386 32-bit
PE32+ executable for MS Windows (GUI) Mono/.Net assembly

Finally there is the filestore keyword. It is the simplest of all: if the rule matches, the files will be written to disk.

Naturally you can combine the file keywords with the regular HTTP keywords, limiting to POST’s for example:

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:”pdf upload claimed, but not pdf”; flow:established,to_server; content:”POST”; http_method; fileext:”pdf”; filemagic:!”PDF document”; filestore; sid:1; rev:1;)

This will alert on and store all files that are uploaded using a POST request that have a filename extension of pdf, but the actual file is not pdf.


The storage to disk is handled by a new output module called “file”. It’s config looks like this:

enabled: yes # set to yes to enable
log-dir: files # directory to store the files
force-magic: no # force logging magic on all stored files

It needs to be enabled for file storing to work.

The files are stored to disk as “file.1”, “file.2”, etc. For each of the files a meta file is created containing the flow information, file name, size, etc. Example:

TIME: 01/27/2010-17:41:11.579196
PCAP PKT NUM: 2847035
DST PORT: 56207
FILENAME: /msdownload/update/software/defu/2010/01/mpas-fe_7af9217bac55e4a6f71c989231e424a9e3d9055b.exe
MAGIC: PE32+ executable for MS Windows (GUI) Mono/.Net assembly
SIZE: 5204


The file extraction is for HTTP only currently, and works on top of our HTTP parser. As the HTTP parser runs on top of the stream reassembly engine, configuration parameters of both these parts of Suricata affect handling of files.

The stream engine option “stream.reassembly.depth” (default 1 Mb) controls the depth into a stream in which we look. Set to 0 for no limit.
The libhtp options request-body-limit and response-body-limit control how far into a HTTP request or response body we look. Again set to 0 for no limit. This can be controlled per HTTP server.


The file handling is fully streaming, so it’s very efficient. Nonetheless there will be an overhead for the extra parsing, book keeping, writing to disk, etc. Memory requirements appear to be limited as well. Suricata shouldn’t keep more than a few kb per flow in memory.


Lack of limits is a limitation. For file storage no limits have been implemented yet. So it’s easy to clutter your disk up with files. Example: 118Gb enterprise pcap storing just JPG’s extracted 400.000 files. Better use a separate partition if you’re on a life link.

Future work

Apart from stabilizing this code and performance optimizing it, the next step will be SMTP file extraction. Possibly other protocols, although nothing is set in stone there yet.