PolySwarm

This page is available in English.

This page is available in 日本語.

Level 1: Scratch to ClamAV

Wrapping a Real Engine: ClamAV

Setting the Stage

ClamAV is an open source signature-based engine with a daemon that provides quick analysis of artifacts that it recognizes. This tutorial will step you through building your second PolySwarm Microengine by means of incorporating ClamAV as an analysis backend.

Note: the PolySwarm marketplace will be a source of previously unseen malware.

Relying on a strictly signature-based engine as your analysis backend, particularly one whose signatures everyone can access (e.g. ClamAV) is unlikely to yield unique insight into "swarmed" artifacts and therefore unlikely to outperform other engines.

This guide should not be taken as a recommendation for how to approach the marketplace but rather an example of how to incorporate an existing analysis backend into a Microengine skeleton.

This tutorial will walk the reader through building microengine/clamav.py; please refer to clamav.py for the completed work.

clamd Implementation and Integration

Start with a fresh engine-template, give it the engine-name of “MyClamAvEngine”. You should find a microengine-myclamavengine in your current working directory - this is what we’ll be editing to implement ClamAV scan functionality.

Edit the __init__.py as we describe below:

We begin our ClamAV analysis backend by importing the clamd module and configuring some globals.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import clamd
import logging
import os
from io import BytesIO

from polyswarmclient.abstractmicroengine import AbstractMicroengine
from polyswarmclient.abstractscanner import AbstractScanner

logger = logging.getLogger(__name__)  # Initialize logger

CLAMD_HOST = os.getenv('CLAMD_HOST', 'localhost')
CLAMD_PORT = int(os.getenv('CLAMD_PORT', '3310'))
CLAMD_TIMEOUT = 30.0

Would you believe me if I said we were almost done? Let’s get clamd initialized and running, so it can communicate with the clamd-daemon over a network socket.

class Scanner(AbstractScanner):
    def __init__(self):
        self.clamd = clamd.ClamdNetworkSocket(CLAMD_HOST, CLAMD_PORT, CLAMD_TIMEOUT)

We interact with clamd by sending it a byte stream of artifact contents.

ClamAV responds to these byte streams in the form:

{'stream': ('FOUND', 'Eicar-Test-Signature')}

We can easily parse the result using python’s [] operator. result[0] is the word FOUND, and result[1] in this instance is Eicar-Test-Signature.

Now, all we need is to implement the scan method in the Scanner class.

    async def scan(self, guid, content, chain):
        result = self.clamd.instream(BytesIO(content)).get('stream')
        if len(result) >= 2 and result[0] == 'FOUND':
            return True, True, ''

        return True, False, ''

If clamd detects a piece of malware, it puts FOUND in result[0].

The return values that the Microengine expects are:

  1. bit : a boolean representing a malicious or benign determination
  2. verdict: another boolean representing whether the engine wishes to assert on the artifact
  3. metadata: (optional) string describing the artifact

We leave including ClamAV’s metadata as an exercise to the reader - or check clamav.py :)

Info: The Microengine class is required, but we do not need to modify it, so it is not shown here.

Finalizing & Testing Your Engine

cookiecutter customizes engine-template only so far - there are a handful of items you’ll need to fill out yourself. We’ve already covered the major items above, but you’ll want to do a quick search for CUSTOMIZE_HERE to ensure all customization have been made.

Once everything is in place, let’s test our engine:

Test Linux-based Engines →

Test Windows-based Engines →

Next Steps

In the Eicar example, we showed you how to implement scan logic directly in the Scanner class. And in this ClamAV example, we showed you how to call out to an external socket to access scanning logic.

Next, we’ll wrap ClamAV and Yara into a single Microengine ->