Go to PolySwarm

(DEPRECATED) PolySwarm Customer API v1

PolySwarm API v1 is deprecated and may stop working at any time.

PolySwarm API v2 brings a large number of improvements in terms of speed and reliability.

Please utilize PolySwarm API v2.

An interface to the PolySwarm customer APIs.

Supports Python 2.7, 3.5 and greater.


From PyPI:

$ pip install polyswarm-api

If you get an error about a missing package named wheel, that means your version of pip is too old. You need pip version 19 or newer. To update pip, run pip install -U pip.

From source:

$ python setup.py install

If you get an error about a missing package named wheel, that means your version of setuptools is too old. You need setuptools version 40.8.0 or newer. To update setuptools, run pip install -U setuptools.

Create an API Client

from polyswarm_api.api import PolyswarmAPI

api_key = "317b21cb093263b701043cb0831a53b9"

api = PolyswarmAPI(key=api_key)

You will need to get your own API key from polyswarm.network/account/api-keys

Perform Scans

# scan one or more files, scan_directory to scan directory
results = api.scan("/home/user/zeus.bin", "/home/user/benign.bin")

for scan_result in results:
    if scan_result.result:
        for scanned_file in scan_result.result.files:
            # score between 0.0 and 1.0 indicating malintent
            poly_score = scanned_file.polyscore
            for assertion in scanned_file.assertions:
                print("Engine {} asserts {}".format(assertion.author_name, "Malicious" if assertion.verdict else "Benign"))

# scan one or more urls
results = api.scan_urls("http://bad.com", "https://google.com")
for url_scan_result in results:
    if url_scan_result.result:
        for scanned_url in url_scan_result.result.files:
            for assertion in scanned_url.assertions:
                print("Engine {} asserts {}".format(assertion.author_name, "Malicious" if assertion.verdict else "Benign"))

# perform rescan
results = api.rescan("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")

When scanning a URL, you should always include the protocol (http:// or https://).

Lookup by Hash

# sha256, md5, and sha1 supported
results = api.search("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f", "b04637c11c63dd5a4a599d7104f0c5880717b5d5b32e0104de5a416963f06118")
for search_result in results:
    for artifact in search_result.result:
        # a score between 0.0 and 1.0 indicating malintent
        poly_score = artifact.last_scan.polyscore
        # all assertion responses from engines
        all_assertions = artifact.last_scan.assertions
        # malicious only assertions from engines
        malicious_detections = list(artifact.last_scan.detections)
        print("{} of {} engines assert malicious".format(len(malicious_detections), len(artifact.last_scan.assertions)))

Metadata Searching

query = "pefile.imphash:ce7f7a334ddcfb21fe7a903165c209e7"

results = api.search_by_metadata(query)

for search_result in results:
    # of type search result
    for artifact in search_result:
        print("Artifact {} seen in countries {}".format(artifact.sha256, artifact.countries))

Metadata Terms

The following is a non-exhaustive list of the terms currently supported by PolySwarm. When searching, each nested level would be separated by ., e.g. pefile.imphash. Names of fields are case-sensitive so take care to specify them correctly. The following list is non-exhaustive. If there are more fields or tools you would like to see, please get in touch at info@polyswarm.io.

  • lief - curated lief output

    • has_nx
    • is_pie
    • libraries - list of imported libraries
    • entrypoint - entrypoint in decimal
    • virtual_size - virtual size in decimal
    • exported_functions - list of exported functions
    • imported_functions - list of imported functions
  • pefile - curated pefile output

    • is_dll - boolean
    • is_exe - boolean
    • exports - exported functions
    • imphash - imphash of the file
    • imports - dictionary of imports in format dllname: [list, of, functions]
    • uses_cfg - boolean
    • uses_dep - boolean
    • uses_seh - boolean
    • compile_date - boolean
    • has_import_table - boolean
    • has_export_table - boolean
    • is_probably_packed - boolean
    • warnings - warnings from pefile parser
  • exiftool - exiftool output (from exiftool -j)

    • MIMEType - mimetype of the file
    • InternalName - internal name extracted from executable
    • OriginalFileName - original name of the file
    • Author - author of the file
    • Title - title of the file
    • Subject - subject of the file
    • LanguageCode - language used by executable (e.g. 'English (U.S.)')
    • CharacterSet - character set of file
    • Language - language of file (e.g. 'en-GB')
    • ModifyDate - last modified time string from document
    • CreateDate - creation time string from document
    • many more; view exiftool documentation for more info.
  • strings - interesting statically-extracted strings

    • domains - observed domains
    • urls - URLs (including things like emails)
    • ipv4 - IPV4 addresses
    • ipv6 - IPV6 addresses
  • hash - hashes

    • ssdeep - SSDEEP fuzzy hash.
    • sha1 - self explanatory.
    • sha3_512 - self explanatory.
    • sha3_256 - self explanatory.
    • tlsh - TLSH fuzzy hash.
    • md5 - self explanatory.
  • scan - engine scan information

    • filename - observed filenames for the artifact (only present if the artifact is a file)
    • url - observed urls for the artifact (only present if the artifact is a url)
    • countries - countries where the artifact was scanned from
    • first_seen - UTC date of when the artifact was first scanned
    • last_seen - UTC date of when the artifact was last scanned
    • first_scan - engine scan information from the first scan

      • artifact_instance_id: Polyswarm's artifact instance ID
      • a list of JSON objects named after its corresponding engine.


        "first_scan": {
              "K7":{ ... },
              "DrWeb":{ ... },
              "ClamAV":{ ... },
              "artifact_instance_id": 61449720328585104


      engine names do not necessarily exist in first_scan nor latest_scan, given that some particular engine(s) might choose not to respond.

      If a query has multiple clauses and a particular clause refers to a field that doesn't exist, Elasticsearch (the search engine Polyswarm API utilizes)

      will ignore that clause. An example:

      scan.latest_scan.InsCyt.assertion:malicious AND scan.detections.malicious:>0

      In this particular case, Elasticsearch will return results where scan.detections.malicious:>0 and ignore scan.latest_scan.InsCyt.assertion:malicious

      because scan.latest_scan.InsCyt does not exist.

      To workaround this issue, we advise you to include an exists clause for the engine before the clause that refers to it:

      _exists_:scan.latest_scan.InsCyt AND scan.latest_scan.InsCyt.assertion:malicious AND scan.detections.malicious:>0

      That's not necessary for other fields, as this scenario only happens for engine names.

      each engine object contains the following fields:

      • metadata - engine metadata

        • malware_family - Malware family, e.g., "TrojanSpy:Linux/EvilGnome.cb5176db"
      • assertion - engine assertion, i.e., ΅benign", "malicious", or "unknown"
    • latest_scan - engines scan information from the latest scan

      contains information in the same format as first_scan, but for the latest scan

    • detections - summary of engines responses

      • benign - number of engines that responded with "benign"
      • malicious - number of engines that responded with "malicious"
      • unknown - number of engines that responded, but didn't reveal their verdict
      • total - number of engines that responded
    • mimetype - mime type information

      • extended - extended mime information, e.g., "PE32 executable (GUI) Intel 80386, for MS Windows"
      • mime - mime type, e.g., "application/x-dosexec"

File Types

The following is a list of common mimetypes/extended mimetype for files currently found in PolySwarm.

MIME Types Kind of document Extension
application/gzip GZip Compressed Archive .gz
application/octet-stream Any kind of binary data .bin
application/pdf Adobe Portable Document Format .pdf
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Microsoft Excel 2007+ (OpenXML) .xlsx
application/vnd.openxmlformats-officedocument.wordprocessingml.document Microsoft Word 2007+ (OpenXML) .docx
application/x-dosexec PE32 executable .exe
application/x-java-applet Compiled Java class data .class
application/x-rar RAR archive data .rar
application/xml XML .xml
application/zip ZIP archive .zip
text/html HyperText Markup Language (HTML) .htm .html
text/plain Text, (generally ASCII or ISO 8859-n) .txt

A list of all official MIME media types provided by IANA can be found here.

Allowed Query Searches

For query search, only a sub-set of Elasticsearch queries are allowed at the moment.

They are only allowed in the following simple form (not in the complete form with all other attributes) for security reasons.

To make command line searching easier, the default input format for the CLI is a query field that will be wrapped into a JSON query_string request. This is likely sufficient for most queries.

Do note: some characters, like backslashes, must be escaped with a backslash.

Query String

    "query": {
      "query_string": {
            "query": "this AND that OR something:>10"

Elasticsearch Query String.

Check If Field Exists

    "query": {
        "exists": {
            "field": "lief.libraries"

Elasticsearch Exists Query.

Range Query

    "query": {
        "range": {
            "age": {
                "gte": 10,
                "lte": 20

Elasticsearch Range Query. These are specially interesting for date fields. You will find a reference on date math here.

Simple Query String

    "query": {
        "simple_query_string": {
            "query": "\"fried eggs\" +(eggplant | potato) -frittata",
            "fields": ["title^5", "body"],
            "default_operator": "and"

Elasticsearch Simple Query String.

Terms (Array) Query

    "query": {
        "terms": {
            "user": ["kimchy", "elasticsearch"]

Elasticsearch Terms Query.

Download Files

dl_results = api.download("/tmp/out", "275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")
for download_result in dl_results:
    if download_result.status_code != 200:
        print("Unable to download file.")

Perform Hunts

Live Hunting

response = api.live(open("eicar.yara").read())
results = api.live_results(hunt_id=response.result.id)

Historical Hunting

response = api.historical(open("eicar.yara").read())
results = api.historical_results(hunt_id=response.result.id)

Perform Rescans

results = api.rescan("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")

Get a Stream

results = api.stream(destination_dir="/my/malware/path")

Stream is a feature that is added to an account on a case-by-case basis. If you'd like to add this feature to your account, contact us at info@polyswarm.io.

2020 © PolySwarm Pte. Ltd.