One area of interest in the development of Suricata is hardware acceleration. Using the GPU is particularly interesting, as they are cheap and widely available. We’ve been looking at using the GPU to speed up pattern matching as a first step. Since OpenCL promises to be a cross platform multi vendor API for doing this we first looked at OpenCL. But we were never able to get something stable out of it, not on the NVIDIA drivers in Linux anyway. As that didn’t go anywhere we decided to use CUDA for the time being. CUDA obviously is NVIDIA only. Once we have CUDA fully running we may revisit OpenCL or look at other implementations like AMD/ATI’s stream API.
What we have so far is a implementation our 2 gram SBNDM pattern matcher algorithm in CUDA. The detection thread(s) currently send packets one by one to a central dispatcher thread that controls the GPU. This setup is far from ideal performance wise, but our first goal was to get it working at all. Currently on my desktop CUDA actually slows things down.
In the next weeks and months we plan to do some redesigning of the CUDA implementation and it’s integration into the engine. We plan to send the packets in batches to the dispatcher thread right after the decoders have determined what the payload portion of a packet is. The (separate) detection thread(s) can then process the results of the GPU when they get to a packet. By using the CUDA scanning async like this we hope that we can reduce the costs of the transfer of packets from and to the card.
Currently the code in the tree can be activated by passing the “–enable-cuda” option to ./configure. Next, in the configuration file enable the cuda pattern matcher by setting the “mpm-algo” option to “b2g_cuda”. As a first test, run the CUDA unittests (assuming you enabled the building of the unittests too) by using “suricata -uUCuda”. Please note that currently running all unittests will fail if CUDA is enabled.
The code is only tested on 32bit Linux at the moment. There are some issues with 64bit that we’re resolving right now. We’re expecting to be continuously updating this code, so be sure to work with the most current version of the git repo all the time!
Let us know your experiences!
Hello,
We are working on a project to also implement an IDS on a GPU, this article and your work is very interesting for us !
We are hesitating between using OpenCL or CUDA, do you have any advice on what is more suitable to use ?
Furthermore, do you think it is possible to get the packets directly from the network card with new GPUs to prevent using the CPU and slow down the process ?
If you have any sample of code on CUDA or OpenCL that could help us on this implementation, we would be very grateful !
Thank you in advance,
Best regards
I have gone through the article. I am involved in the same project.Currently I am practicing the CUDA programs and learning how can we implement parallel algorithms in CUDA model.
In this project I will surely need help from your side since you have already experienced . Can I contact further for the help?
Sure! Email me or join the oisf-devel mailinglist to discuss things further!
Yes, I have tried and built a 64bit Suricata 1.4 with CUDA toolkit 5.0 on my Securityonion box. It does not look too good. Too bad, that Securityonion for Ubuntu 12.04 is 64bit only. I am trying to build a (possibly statically linked) 32bit binary of Suricata with CUDA and try it out.
If you want testing with Ubuntu 12.04 + CUDA, please let me know.
I have Nvidia GTX 560 Ti (compute capability 2.1) and GTX 680 (cc 3.0) at my disposal.
Furthermore, Suricata sources did not seem to build cc 3.0 kernel for CUDA,
maybe it’s because you are using an older version of CUDA toolkit (and hardware).
@Sami: please join our oisf-devel list. We can support you there. Also, rewritten CUDA support is in the works and testing would be appreciated.
Yes, I joined the mailing list. And yes, I will gladly help you testing CUDA support.
Hello,
I am researching on suricata2.0, and want to test CUDA support, but I find it doesn’t work, it always call SCACSearch to detect, not SCACCudaSearch64. the condition and environment are as follows:
Envirionment:
OS: 64bit CentOS
GPU: GeFore GT 620
CUDA: 5.0
Configurations:
1. ./configure –enable-cuda
2. max-pending-packets: 65000
3. mpm-algo: ac-cuda
4. data-buffer-size-min-limit: 0
5. data-buffer-size-max-limit: 1500
6. cudabuffer-buffer-size: 500mb
7. gpu-transfer-size: 50mb
8. batching-timeout: 10000
9. device-id: 0
10. cuda-streams: 2
The cuda kernel funtion is never been called, I am eager to slove the problem and test the performance between CPU version and GPU version.
Thanks.
Amelie