PolySwarmPolySwarmPolySwarmPolySwarm
Go to PolySwarm
Home

PolySwarm Customer API v2

An interface to the version 2 PolySwarm customer APIs.

Supports Python 2.7, 3.5 and greater.

Installation

From PyPI:

$ pip install polyswarm-api

From source:

$ python setup.py install

Create an API Client

from polyswarm_api.api import PolyswarmAPI

api_key = "317b21cb093263b701043cb0831a53b9"

api = PolyswarmAPI(key=api_key)

You will need to get your own API key from polyswarm.network/account/api-keys

Perform Scans

# scan one file
FILE = '/home/user/malicious.bin'

positives = 0
total = 0

instance = api.submit(FILE)
result = api.wait_for(instance)

if result.failed:
    print(f'Failed to get results')
    sys.exit()

print('Microengine Assertions:')
for assertion in result.assertions:
    if assertion.verdict:
        positives += 1
    total += 1
    print('\tEngine {} asserts {}'.\
            format(assertion.author_name,
                   'Malicious' if assertion.verdict else 'Benign'))

print(f'Positives: {positives}')
print(f'Total: {total}')
print(f'PolyScore: {result.polyscore}\n')

print(f'sha256: {result.sha256}')
print(f'sha1: {result.sha1}')
print(f'md5: {result.md5}')
print(f'Extended type: {result.extended_type}')
print(f'Fisrt Seen: {result.first_seen}')
print(f'Last Seen: {result.last_seen}\n')

print(f'Permalink: {result.permalink}')

# scan one URL
URL = 'https://polyswarm.io'

positives = 0
total = 0

instance = api.submit(URL, artifact_type='url')
result = api.wait_for(instance)

if result.failed:
    print(f'Failed to get results')
    sys.exit()

print('Microengine Assertions:')
for assertion in result.assertions:
    if assertion.verdict:
        positives += 1
    total += 1
    print('\tEngine {} asserts {}'.\
            format(assertion.author_name,
                   'Malicious' if assertion.verdict else 'Benign'))

print(f'Positives: {positives}')
print(f'Total: {total}\n')

print(f'Permalink: {result.permalink}')

When scanning a URL, you should always include the protocol (http:// or https://).

Lookup by Hash

# sha256, md5, and sha1 supported
EICAR_HASH = '275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f'

positives = 0
total = 0

results = api.search(EICAR_HASH)

for result in results:
    if result.failed:
        print(f'Failed to get result.')
        break

    if not result.assertions:
        print('Artifact not scanned yet - Run rescan for Microengine Assertions.')
    else:
        print('Microengine Assertions:')

        for assertion in result.assertions:
            if assertion.verdict:
                positives += 1
            total += 1
            print('\tEngine {} asserts {}'.\
                  format(assertion.author_name,
                         'Malicious' if assertion.verdict else 'Benign'))

    print(f'Positives: {positives}')
    print(f'Total: {total}')
    print(f'PolyScore: {result.polyscore}\n')

    print(f'sha256: {result.sha256}')
    print(f'sha1: {result.sha1}')
    print(f'md5: {result.md5}')
    print(f'Extended type: {result.extended_type}')
    print(f'Fisrt Seen: {result.first_seen}')
    print(f'Last Seen: {result.last_seen}\n')

    print(f'Permalink: {result.permalink}')

Metadata Searching

query = "pefile.imphash:ce7f7a334ddcfb21fe7a903165c209e7"

results = api.search_by_metadata(query)

for search_result in results:
    # of type search result
    for artifact in search_result:
        print("Artifact {} seen in countries {}".format(artifact.sha256, artifact.countries))

Metadata Terms

The following is a non-exhaustive list of the terms currently supported by PolySwarm. When searching, each nested level would be separated by ., e.g. pefile.imphash. Names of fields are case-sensitive so take care to specify them correctly. The following list is non-exhaustive. If there are more fields or tools you would like to see, please get in touch at info@polyswarm.io.

  • lief - curated lief output

    • has_nx
    • is_pie
    • libraries - list of imported libraries
    • entrypoint - entrypoint in decimal
    • virtual_size - virtual size in decimal
    • exported_functions - list of exported functions
    • imported_functions - list of imported functions
  • pefile - curated pefile output

    • is_dll - boolean
    • is_exe - boolean
    • exports - exported functions
    • imphash - imphash of the file
    • imports - dictionary of imports in format dllname: [list, of, functions]
    • uses_cfg - boolean
    • uses_dep - boolean
    • uses_seh - boolean
    • compile_date - boolean
    • has_import_table - boolean
    • has_export_table - boolean
    • is_probably_packed - boolean
    • warnings - warnings from pefile parser
  • exiftool - exiftool output (from exiftool -j)

    • MIMEType - mimetype of the file
    • InternalName - internal name extracted from executable
    • OriginalFileName - original name of the file
    • Author - author of the file
    • Title - title of the file
    • Subject - subject of the file
    • LanguageCode - language used by executable (e.g. 'English (U.S.)')
    • CharacterSet - character set of file
    • Language - language of file (e.g. 'en-GB')
    • ModifyDate - last modified time string from document
    • CreateDate - creation time string from document
    • many more; view exiftool documentation for more info.
  • strings - interesting statically-extracted strings

    • domains - observed domains
    • urls - URLs (including things like emails)
    • ipv4 - IPV4 addresses
    • ipv6 - IPV6 addresses
  • hash - hashes

    • ssdeep - SSDEEP fuzzy hash.
    • sha1 - self explanatory.
    • sha3_512 - self explanatory.
    • sha3_256 - self explanatory.
    • tlsh - TLSH fuzzy hash.
    • md5 - self explanatory.
  • scan - microengine scan information

    • filename - observed filenames for the artifact (only present if the artifact is a file)
    • url - observed urls for the artifact (only present if the artifact is a url)
    • countries - countries where the artifact was scanned from
    • first_seen - UTC date of when the artifact was first scanned
    • last_seen - UTC date of when the artifact was last scanned
    • first_scan - microengine scan information from the first scan

      • artifact_instance_id: Polyswarm's artifact instance ID
      • a list of JSON objects named after its corresponding microengine.

      Example:

        "first_scan": {
              "K7":{ ... },
              "DrWeb":{ ... },
              "ClamAV":{ ... },
              ...
              "artifact_instance_id": 61449720328585104
        }

      each microengine object contains the following fields:

      • metadata - microengine metadata

        • malware_family - Malware family, e.g., "TrojanSpy:Linux/EvilGnome.cb5176db"
      • assertion - microengine assertion, i.e., ΅benign", "malicious", or "unknown"
    • latest_scan - microengines scan information from the latest scan

      contains information in the same format as first_scan, but for the latest scan

    • detections - summary of microengines responses

      • benign - number of microengines that responded with "benign"
      • malicious - number of microengines that responded with "malicious"
      • unknown - number of microengines that responded, but didn't reveal their verdict
      • total - number of microengines that responded
    • mimetype - mime type information

      • extended - extended mime information, e.g., "PE32 executable (GUI) Intel 80386, for MS Windows"
      • mime - mime type, e.g., "application/x-dosexec"

File Types

The following is a list of common mimetypes/extended mimetype for files currently found in PolySwarm.

MIME Types Kind of document Extension
application/gzip GZip Compressed Archive .gz
application/octet-stream Any kind of binary data .bin
application/pdf Adobe Portable Document Format .pdf
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Microsoft Excel 2007+ (OpenXML) .xlsx
application/vnd.openxmlformats-officedocument.wordprocessingml.document Microsoft Word 2007+ (OpenXML) .docx
application/x-dosexec PE32 executable .exe
application/x-java-applet Compiled Java class data .class
application/x-rar RAR archive data .rar
application/xml XML .xml
application/zip ZIP archive .zip
text/html HyperText Markup Language (HTML) .htm .html
text/plain Text, (generally ASCII or ISO 8859-n) .txt

A list of all official MIME media types provided by IANA can be found here.

Allowed Query Searches

Only Elasticsearch's query_string searches are allowed at the moment.

Exact Value Match:

hash.md5:e90099d6f3078a9691ab8fe38f0f25e4

Note that "hash.md5" is the JSON path of the attribute "md5" inside the metadata JSON blob:

{
 "lief": ...,
 "pefile": ...,
 ...
  "hash": {   "ssdeep": ...
              ...
              "md5": ...
          }
}

Check If Attribute Exists

_exists_:lief.libraries

Will return all artifacts that contain the metadata attribute specified by the JSON path "lief.libraries".

Boolean Operators

You may combine multiple attributes matches in your query using boolean operators:

scan.latest_scan.DrWeb.metadata.scanner.environment.operating_system:Linux AND scan.latest_scan.DrWeb.metadata.scanner.environment.architecture:x86_64

Will search for all artifacts scanned by microengine DrWeb running on O.S. "Linux" O.S. and architecture "x86_64".

You may also use boolean operators for attribute values:

scan.latest_scan.DrWeb.metadata.scanner.environment.operating_system:(Linux OR Windows)

Will search for all artifacts scanned by DrWeb running on Linux OR Windows.

Comparison Operators

> (greater), < (less than), >= (greater or equal), or <= (less or equal) are also supported.

scan.latest_scan.assertions.benign:>0

Will search for all artifacts with at least one benign assertion.

Grouping

You may group comparison or boolean expressions together using parenthesis:

(scan.latest_scan.assertions.benign:>0 AND scan.latest_scan.assertions.benign:<10) OR _exists_:scan.first_scan.DrWeb

Will search for all scans with more than zero and less than ten assertions or first scanned by DrWeb.

Ranges

You may combine boolean and comparison operators to search in ranges of values:

scan.latest_scan.assertions.benign:>0 AND scan.latest_scan.assertions.benign:<=10

Will search for all artifacts with at least one benign assertion, but no more than 10.

But there's a short hand form as well:

scan.latest_scan.assertions.benign:[0 TO 10]

In the above query, "0" and "10" are included in the interval, but we may exclude either of them:

scan.latest_scan.assertions.benign:[0 TO 10}

Will search for all artifacts with at least one assertion, but no more than 9.

You may also use ranges for date attributes:

exiftool.createdate:[2019-01-01 TO 2019-12-31]

Will search for all artifacts created in the year 2019.

Wildcards

Wildcards are also permitted:

scan.first_scan.K7.metadata.malware_family:Trojan*"

Will search for all artifacts first scanned by microengine K7 with malware family matching the expression Trojan*, e.g. "Trojan-Ransom.Satan", "Trojan.Packed2.39908", etc.

Wildcards also work for attribute names, but * needs to be escaped:

scan.latest_scan.\*.metadata.scanner.environment.operating_system:Linux

Will search for all artifacts scanned by ANY microengine running on the O.S. "Linux".

Note: Do not escape * in the values, only in attribute names.

Regular Expressions

Advanced users may also use regular expressions:

scan.first_scan.K7.metadata.malware_family:/Backdoor.*/

Will search for all artifacts first scanned by K7 with malware family matching the regex /Backdoor.+, e.g. "Backdoor/Wabot.z", "Backdoor ( 0040f5511 )", etc.

You will find more information on regular expressions here

Note: ^ (beginning of line) or $ (end of line) are not supported.

Download Files

OUTPUT_DIR = '/tmp/'
EICAR_HASH = '275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f'

artifact = api.download(OUTPUT_DIR, EICAR_HASH)

Perform Hunts

Live Hunting

# create and start live hunt
YARA_RULE = 'banker_families.yar'
RULESET_NAME = 'Banker Families Live Hunt'
ACTIVE = True

live_hunt = api.live_create(rule=open(YARA_RULE).read(),
                            active=ACTIVE,
                            ruleset_name=RULESET_NAME)
print(f'ID: {live_hunt.id}')
print(f'Rule Set Name: {live_hunt.ruleset_name}')
print(f'Created: {live_hunt.created}')
print(f'Active: {live_hunt.active}')
print(f'Status: {live_hunt.status}')

# get live hunt list and IDs
hunt_list = api.live_list()

for hunt in hunt_list:
    print(f'HUNT ID: {hunt.id}')
    print(f'Rule Set Name: {hunt.ruleset_name}')
    print(f'Created: {hunt.created}')
    print(f'Active: {hunt.active}')
    print(f'Status: {hunt.status}')

# fetch results
HUNT_ID = 48079983714547442
SINCE = 2000 # How far back in seconds to request results (default: 1440)

results = api.live_results(hunt=HUNT_ID, since=SINCE)

for hunt in results:
    print(f'ID: {hunt.id}')
    print(f'Rule Name: {hunt.rule_name}')
    print(f'Tags: {hunt.tags}')
    print(f'Created: {hunt.created}')
    print(f'SHA256: {hunt.sha256}')
    print(f'Permalink: {hunt.artifact.permalink}')
    print(f'PolyScore: {hunt.artifact.polyscore}')

# delete live hunt
HUNT_ID = 48079983714547442

hunt_deleted = api.historical_delete(HUNT_ID)

print(f'ID: {hunt_deleted.id}')
print(f'Rule Set Name: {hunt_deleted.ruleset_name}')
print(f'Created: {hunt_deleted.created}')
print(f'Active: {hunt_deleted.active}')
print(f'Status: {hunt_deleted.status}')

Historical Hunting

# create and start historical hunt
YARA_RULE = 'APT_signatures.yar'
RULESET_NAME = 'APT Historical Ruleset'

live_hunt = api.historical_create(rule=open(YARA_RULE).read(),
                                  ruleset_name=RULESET_NAME)
print(f'ID: {live_hunt.id}')
print(f'Rule Set Name: {live_hunt.ruleset_name}')
print(f'Created: {live_hunt.created}')
print(f'Active: {live_hunt.active}')
print(f'Status: {live_hunt.status}')

# get historical hunt list and IDs
hunt_list = api.historical_list()

for hunt in hunt_list:
    print(f'HUNT ID: {hunt.id}')
    print(f'Rule Set Name: {hunt.ruleset_name}')
    print(f'Created: {hunt.created}')
    print(f'Active: {hunt.active}')
    print(f'Status: {hunt.status}')

# fetch results
HUNT_ID = 41408929604057916

results = api.historical_results(hunt=HUNT_ID)

for hunt in results:
    print(f'ID: {hunt.id}')
    print(f'Rule Name: {hunt.rule_name}')
    print(f'Tags: {hunt.tags}')
    print(f'Created: {hunt.created}')
    print(f'SHA256: {hunt.sha256}')
    print(f'Permalink: {hunt.artifact.permalink}')
    print(f'PolyScore: {hunt.artifact.polyscore}')

# delete historical hunt
HUNT_ID = 41408929604057916

hunt_deleted = api.live_delete(HUNT_ID)

print(f'ID: {hunt_deleted.id}')
print(f'Rule Set Name: {hunt_deleted.ruleset_name}')
print(f'Created: {hunt_deleted.created}')
print(f'Active: {hunt_deleted.active}')
print(f'Status: {hunt_deleted.status}')

Perform Rescans

instance = api.rescan("275a021bbfb6489e54d471899f7db9d1663fc695ec2fe2a2c4538aabf651fd0f")
result = api.wait_for(instance)

if result.failed:
    print(f'Failed to get results')
    sys.exit()

positives = 0
total = 0

print('Microengine Assertions:')
for assertion in result.assertions:
    if assertion.verdict:
        positives += 1
    total += 1
    print('\tEngine {} asserts {}'.\
            format(assertion.author_name,
                   'Malicious' if assertion.verdict else 'Benign'))

print(f'Positives: {positives}')
print(f'Total: {total}')
print(f'PolyScore: {result.polyscore}\n')

print(f'sha256: {result.sha256}')
print(f'sha1: {result.sha1}')
print(f'md5: {result.md5}')
print(f'Extended type: {result.extended_type}')
print(f'Fisrt Seen: {result.first_seen}')
print(f'Last Seen: {result.last_seen}\n')

print(f'Permalink: {result.permalink}')

Get a Stream

SINCE = 60 # Fetch stream from the last 60 minutes

streams = api.stream(since=SINCE)

for stream in streams:
    print(f'ID: {stream.id}')
    print(f'URI: {stream.uri}')
    print(f'Created: {stream.created}')
    print(f'Community: {stream.community}')

Stream is a feature that is added to an account on a case-by-case basis. If you'd like to add this feature to your account, contact us at info@polyswarm.io.

2020 © PolySwarm Pte. Ltd.