PolySwarmPolySwarmPolySwarmPolySwarm
Go to PolySwarm
Home

PolySwarm Customer API

An interface to the PolySwarm customer APIs.

Supports Python 2.7, 3.5 and greater.

Installation

From PyPI:

$ pip install polyswarm-api

From source:

$ python setup.py install

Create an API Client

from polyswarm_api.api import PolyswarmAPI

api_key = "317b21cb093263b701043cb0831a53b9"

api = PolyswarmAPI(key=api_key)

You will need to get your own API key from polyswarm.network/account/api-keys

Perform Scans

# scan one or more files, scan_directory to scan directory
results = api.scan("/home/user/zeus.bin", "/home/user/benign.bin")

for scan_result in results:
    if scan_result.result:
        for scanned_file in scan_result.result.files:
            # score between 0.0 and 1.0 indicating malintent
            poly_score = scanned_file.polyscore
            for assertion in scanned_file.assertions:
                print("Engine {} asserts {}".format(assertion.author_name, "Malicious" if assertion.verdict else "Benign"))

# scan one or more urls
results = api.scan_urls("http://bad.com", "https://google.com")
for url_scan_result in results:
    if url_scan_result.result:
        for scanned_url in url_scan_result.result.files:
            for assertion in scanned_url.assertions:
                print("Engine {} asserts {}".format(assertion.author_name, "Malicious" if assertion.verdict else "Benign"))


# perform rescan
results = api.rescan("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")

Lookup by Hash

# sha256, md5, and sha1 supported
results = api.search("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f", "b04637c11c63dd5a4a599d7104f0c5880717b5d5b32e0104de5a416963f06118")
for search_result in results:
    for artifact in search_result.result:
        # a score between 0.0 and 1.0 indicating malintent
        poly_score = artifact.last_scan.polyscore
        # all assertion responses from engines
        all_assertions = artifact.last_scan.assertions
        # malicious only assertions from engines
        malicious_detections = list(artifact.last_scan.detections)
        print("{} of {} engines assert malicious".format(len(malicious_detections), len(artifact.last_scan.assertions)))

Metadata Searching

query = "pefile.imphash:ce7f7a334ddcfb21fe7a903165c209e7"

results = api.search_by_metadata(query)

for search_result in results:
    # of type search result
    for artifact in search_result:
        print("Artifact {} seen in countries {}".format(artifact.sha256, artifact.countries))

Metadata Terms

The following is a non-exhaustive list of the terms currently supported by PolySwarm. When searching, each nested level would be separated by ., e.g. pefile.imphash. Names of fields are case-sensitive so take care to specify them correctly. The following list is non-exhaustive. If there are more fields or tools you would like to see, please get in touch at info@polyswarm.io.

  • lief - curated lief output

    • has_nx
    • is_pie
    • libraries - list of imported libraries
    • entrypoint - entrypoint in decimal
    • virtual_size - virtual size in decimal
    • exported_functions - list of exported functions
    • imported_functions - list of imported functions
  • pefile - curated pefile output

    • is_dll - boolean
    • is_exe - boolean
    • exports - exported functions
    • imphash - imphash of the file
    • imports - dictionary of imports in format dllname: [list, of, functions]
    • uses_cfg - boolean
    • uses_dep - boolean
    • uses_seh - boolean
    • compile_date - boolean
    • has_import_table - boolean
    • has_export_table - boolean
    • is_probably_packed - boolean
    • warnings - warnings from pefile parser
  • exiftool - exiftool output (from exiftool -j)

    • MIMEType - mimetype of the file
    • InternalName - internal name extracted from executable
    • OriginalFileName - original name of the file
    • Author - author of the file
    • Title - title of the file
    • Subject - subject of the file
    • LanguageCode - language used by executable (e.g. 'English (U.S.)')
    • CharacterSet - character set of file
    • Language - language of file (e.g. 'en-GB')
    • ModifyDate - last modified time string from document
    • CreateDate - creation time string from document
    • many more; view exiftool documentation for more info.
  • strings - interesting statically-extracted strings

    • domains - observed domains
    • urls - URLs (including things like emails)
    • ipv4 - IPV4 addresses
    • ipv6 - IPV6 addresses
  • scan - microengine scan information

    • filename - observed filenames for the artifact (only present if the artifact is a file)
    • url - observed urls for the artifact (only present if the artifact is a url)
    • countries - countries where the artifact was scanned from
    • first_seen - UTC date of when the artifact was first scanned
    • last_seen - UTC date of when the artifact was last scanned
    • first_scan - microengine scan information from the first scan

      • artifact_instance_id: Polyswarm's artifact instance ID
      • a list of JSON objects named after its corresponding microengine.

      Example:

        "first_scan": {
              "K7":{ ... },
              "DrWeb":{ ... },
              "ClamAV":{ ... },
              ...
              "artifact_instance_id": 61449720328585104
        }

      each microengine object contains the following fields:

      • metadata - microengine metadata

        • scanner- scanner information

          • version - microengine version
          • environment - OS information

            • architecture - microprocessor architecture, e.g., "AMD64"
            • operating_type - operating system type, e.g., "Windows", "Linux"
          • vendor_version - version string, e.g., "15.2.0.42"
          • signature_version - signature version string
        • malware_family - Malware family, e.g., "TrojanSpy:Linux/EvilGnome.cb5176db"
      • assertion - microengine assertion, i.e., ΅benign", "malicious", or "unknown"
    • last_scan - microengines scan information from the latest scan

      contains information in the same format as first_scan, but for the latest scan

    • detections - summary of microengines detections

      • benign - number of "benign" detections
      • malicious - number of "malicious" detections
      • unknown - number of "unknown" detections
      • total - total detections
    • mimetype - mime type information

      • extended - extended mime information, e.g., "PE32 executable (GUI) Intel 80386, for MS Windows"
      • mime - mime type, e.g., "application/x-dosexec"

Allowed Query Searches

For query search, only a sub-set of Elasticsearch queries are allowed at the moment.

They are only allowed in the following simple form (not in the complete form with all other attributes) for security reasons.

To make command line searching easier, the default input format for the CLI is a query field that will be wrapped into a JSON query_string request. This is likely sufficient for most queries.

Do note: some characters, like backslashes, must be escaped with a backslash.

Query String

{
    "query": {
      "query_string": {
            "query": "this AND that OR something:>10"
        }
    }
}

Elasticsearch Query String.

Check If Field Exists

{
    "query": {
        "exists": {
            "field": "lief.libraries"
        }
    }
}

Elasticsearch Exists Query.

Range Query

{
    "query": {
        "range": {
            "age": {
                "gte": 10,
                "lte": 20
            }
        }
    }
}

Elasticsearch Range Query. These are specially interesting for date fields. You will find a reference on date math here.

Simple Query String

{
    "query": {
        "simple_query_string": {
            "query": "\"fried eggs\" +(eggplant | potato) -frittata",
            "fields": ["title^5", "body"],
            "default_operator": "and"
        }
    }
}

Elasticsearch Simple Query String.

Terms (Array) Query

{
    "query": {
        "terms": {
            "user": ["kimchy", "elasticsearch"]
        }
    }
}

Elasticsearch Terms Query.

Download Files

dl_results = api.download("/tmp/out", "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")
for download_result in dl_results:
    if download_result.status_code != 200:
        print("Unable to download file.")

Perform Hunts

Live Hunting

response = api.live(open("eicar.yara").read()) 
results = api.live_results(hunt_id=response.result.id)

Historical Hunting

response = api.historical(open("eicar.yara").read()) 
results = api.historical_results(hunt_id=response.result.id)

Perform Rescans

results = api.rescan("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")

Get a Stream

results = api.stream(destination_dir="/my/malware/path")

Stream is a feature that is added to an account on a case-by-case basis. If you'd like to add this feature to your account, contact us at info@polyswarm.io.