How to build a Metadata query
PolySwarm's Metadata Search provides you with the functionality to search through PolySwarm’s dataset to find samples that relate to information you are interested in.
Once PolySwarm Scans or Sandboxes a sample, it will produce a large amount of metadata, this metadata is mapped into attribute fields that can be searched using Metadata Search, making it easy to find samples that you are interested in.
The general structure of a metadata query is:
field:matched_value [logic field:matched_value]
For example:
scan.detections.malicious:>1 AND artifact.type:exe
PolySwarm's Metadata Search is backed by Elasticsearch and supports the full range of Elasticsearch search criteria to deliver flexible results quickly, this includes Boolean logic, grouping, ranges, wildcards and regex.
How to find searchable fields
PolySwarm contains hundreds of searchable fields. You can find a full list in our Searchable Fields Reference.
Alternatively, you can use the polyswarm search mapping
command to quickly list available fields. See the CLI documentation for more details.
Once a metadata query has been built, you can use the same query in the Metadata Search UI, API Endpoint or Command Line Queries.
Mime file Types
The following is a list of common mimetypes useful for querying via scan.mimetype.mime
.
MIME Types | Kind of document | Extension |
---|---|---|
application/gzip | GZip Compressed Archive | .gz |
application/octet-stream | Any kind of binary data | .bin |
application/pdf | Adobe Portable Document Format | |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Microsoft Excel 2007+ (OpenXML) | .xlsx |
application/vnd.openxmlformats-officedocument.wordprocessingml.document | Microsoft Word 2007+ (OpenXML) | .docx |
application/x-dosexec | PE32 executable | .exe |
application/x-java-applet | Compiled Java class data | .class |
application/x-rar | RAR archive data | .rar |
application/xml | XML | .xml |
application/zip | ZIP archive | .zip |
text/html | HyperText Markup Language (HTML) | .htm .html |
text/plain | Text, (generally ASCII or ISO 8859-n) | .txt |
A list of all official MIME media types provided by IANA can be found here.
Considerations
There are some items to consider when building out the query:
- Attribute fields are case-sensitive.
- If a query refers to a field that doesn't exist, Metadata Search will ignore this portion of the query. Use the
_exists_
logic to check if the field is available, e.g.,_exists_:scan.latest_scan.assertions.ClamAV
- Always enclose literals in double-quotations (
"
), or alternatively, escape all Elasticsearch control characters in your query. - RegEx: Queries using
^
(beginning of line) or$
(end of line) are not supported. -
Wildcards:
- Do not escape
*
in values—only in Attribute names. - Recommend using no more than one wildcard in a metadata query.
- Do not escape
-
Ranges:
- Square brackets (
[
&]
) include range boundaries. - Curly brackets (
{
&}
) exclude range boundaries.
- Square brackets (
Examples
Below are real-world metadata query examples that can be used in the UI, CLI, and API.
Metadata Query | Description |
---|---|
scan.latest_scan.assertions.\*.metadata.malware_family:*Trojan* |
Return all artifacts identified as belonging to a malware family that contains "Trojan" (Wildcard search). |
exiftool.createdate:[2019-01-01 TO 2019-12-31] |
Return all artifacts with an exiftool.createdate in the year 2019 (Range search). |
scan.latest_scan.detections.benign:[0 TO 10] |
Return artifacts where benign detections range between 0 and 10, including 10. |
(scan.latest_scan.detections.malicious:>0 AND scan.latest_scan.detections.malicious:<=3) OR scan.latest_scan.assertions.ClamAV.assertion:malicious |
Return artifacts detected as malicious by 1, 2, or 3 engines, OR those detected as malicious by ClamAV (Grouping). |
scan.latest_scan.detections.benign:>0 |
Return artifacts with at least one benign assertion (Comparison Operators). |
pefile.compile_date:[2022-11-10 TO 2022-11-20] AND scan.detections.malicious:>1 AND polyunite.malware_family:Emotet |
Return artifacts with a compile date between two values, AND has been detected by more than one malicious assertion AND is related to Emotet. |