PolySwarmPolySwarmPolySwarmPolySwarm
Go to PolySwarm
Home

How to build a Metadata query

PolySwarm's Metadata Search provides you with the functionality to search through PolySwarm’s dataset to find samples that relate to information you are interested in.

Once PolySwarm Scans or Sandboxes a sample, it will produce a large amount of metadata, this metadata is mapped into attribute fields that can be searched using Metadata Search, making it easy to find samples that you are interested in.

The general structure of a metadata query is:

field:matched_value [logic field:matched_value]

For example:

scan.detections.malicious:>1 AND artifact.type:exe

PolySwarm's Metadata Search is backed by Elasticsearch and supports the full range of Elasticsearch search criteria to deliver flexible results quickly, this includes Boolean logic, grouping, ranges, wildcards and regex.

How to find searchable fields

PolySwarm contains hundreds of searchable fields. You can find a full list in our Searchable Fields Reference.

Alternatively, you can use the polyswarm search mapping command to quickly list available fields. See the CLI documentation for more details.

Once a metadata query has been built, you can use the same query in the Metadata Search UI, API Endpoint or Command Line Queries.

Mime file Types

The following is a list of common mimetypes useful for querying via scan.mimetype.mime.

MIME Types Kind of document Extension
application/gzip GZip Compressed Archive .gz
application/octet-stream Any kind of binary data .bin
application/pdf Adobe Portable Document Format .pdf
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet Microsoft Excel 2007+ (OpenXML) .xlsx
application/vnd.openxmlformats-officedocument.wordprocessingml.document Microsoft Word 2007+ (OpenXML) .docx
application/x-dosexec PE32 executable .exe
application/x-java-applet Compiled Java class data .class
application/x-rar RAR archive data .rar
application/xml XML .xml
application/zip ZIP archive .zip
text/html HyperText Markup Language (HTML) .htm .html
text/plain Text, (generally ASCII or ISO 8859-n) .txt

A list of all official MIME media types provided by IANA can be found here.

Considerations

There are some items to consider when building out the query:

  • Attribute fields are case-sensitive.
  • If a query refers to a field that doesn't exist, Metadata Search will ignore this portion of the query. Use the _exists_ logic to check if the field is available, e.g., _exists_:scan.latest_scan.assertions.ClamAV
  • Always enclose literals in double-quotations ("), or alternatively, escape all Elasticsearch control characters in your query.
  • RegEx: Queries using ^ (beginning of line) or $ (end of line) are not supported.
  • Wildcards:

    • Do not escape * in values—only in Attribute names.
    • Recommend using no more than one wildcard in a metadata query.
  • Ranges:

    • Square brackets ([ & ]) include range boundaries.
    • Curly brackets ({ & }) exclude range boundaries.

Examples

Below are real-world metadata query examples that can be used in the UI, CLI, and API.

Metadata Query Description
scan.latest_scan.assertions.\*.metadata.malware_family:*Trojan* Return all artifacts identified as belonging to a malware family that contains "Trojan" (Wildcard search).
exiftool.createdate:[2019-01-01 TO 2019-12-31] Return all artifacts with an exiftool.createdate in the year 2019 (Range search).
scan.latest_scan.detections.benign:[0 TO 10] Return artifacts where benign detections range between 0 and 10, including 10.
(scan.latest_scan.detections.malicious:>0 AND scan.latest_scan.detections.malicious:<=3) OR scan.latest_scan.assertions.ClamAV.assertion:malicious Return artifacts detected as malicious by 1, 2, or 3 engines, OR those detected as malicious by ClamAV (Grouping).
scan.latest_scan.detections.benign:>0 Return artifacts with at least one benign assertion (Comparison Operators).
pefile.compile_date:[2022-11-10 TO 2022-11-20] AND scan.detections.malicious:>1 AND polyunite.malware_family:Emotet Return artifacts with a compile date between two values, AND has been detected by more than one malicious assertion AND is related to Emotet.

2025 © PolySwarm Pte. Ltd.