PolySwarmPolySwarmPolySwarmPolySwarm
Help

Ambassadors

Target Audience

Ambassadors are gateways to the PolySwarm marketplace. It is Ambassadors' responsibility to translate queries into marketplace actions and aggregate results on behalf of consumers of PolySwarm threat intelligence (e.g. enterprise customers). If you'd like to act as a conduit to the PolySwarm marketplace on your own behalf or on behalf third party consumers, continue reading.

When a consumer uses the PolySwarm web interface or polyswarm-api they will, by default, use an Ambassador hosted by Swarm Technologies, Inc.

Consumers may choose to speak to other Ambassadors using polyswarm-api's --api-uri argument or POLYSWARM_API_URI environment variable.


Ambassadors' Role in the Marketplace

Ambassadors act as intermediaries between consumers of PolySwarm threat intelligence and the PolySwarm marketplace. Broadly, Ambassadors broker artifact uploads, hash searches, hunts and other features as they are developed, creating actionable events in the PolySwarm marketplace on behalf of consumers and then deliver results to consumers.

Artifact Submission Lifecycle

The artifact submission lifecycle at a conceptual level:

  1. The consumer submits an artifact, and optionally, a PolySwarm community preference to the Ambassador*
  2. The Ambassador hosts the artifact in a manner that is accessible to microengines in the appropriate community/communities.
  3. The Ambassador creates a bounty for the artifact in each community, specifying: initial bounty amount, a signed transaction that escrows the initial bounty amount from the Ambassador's wallet to the community's BountyRegistry contract**, the bounty's assertion duration, the URI of the artifact, and, optionally, additional metadata (e.g. artifact filetype). This is accomplished by speaking to each community's instance of polyswarmd using the polyswarmd API.
  4. Engines in each community have until the assertion window closes to return assertions on the artifact.
  5. Engines submit their assertions (and stake amounts) to the community's polyswarmd.
  6. Each community's polyswarmd instance delivers the aggregated results to the Ambassador.
  7. (Optional) the Ambassador distills the various assertions into digestible intelligence for the consumer.
  8. The Ambassador delivers finished intelligence to the consumer.

*Ambassador or consumer handle community allocation.

**It is the Ambassador's responsibility to ensure that they have adequate funds relayed to each community for the payment of network fees and to cover the initial bounty amount.

Hash Search Lifecycle

Providing hash search capability to consumers requires:

  1. An archive of submitted artifacts and the market's responses to these artifacts.
  2. A strategy for handling hash misses.

The hash search lifecycle at a conceptual level:

  1. The Ambassador maintains an archive of all incoming queries and responses. This archive contains attributes describing the queries, e.g. each submitted artifact's hash & filetype, as well as the responses, e.g. engine to assertion (& NCT stake amount) mapping over time.
  2. The consumer submits a request for information about a particular artifact hash.
  3. The Ambassador consults their archive for information about the hash of the file.
  4. If the Ambassador has data about the hash, it returns that data.

The Ambassador may not have data about a hash because:

  1. The hash corresponds to an artifact that was never submitted to the PolySwarm market.
  2. The hash corresponds to an artifact that was not submitted to the PolySwarm market via this Ambassador. Another Ambassador, e.g. Swarm Technologies' Ambassador, may have seen the artifact.
  3. The Ambassador previously handled the artifact but did not archive for various reasons: error, legal requirements, etc.

If your archive does not have the data needed to directly respond to your consumer's query, you may considering:

  1. Offering to (re)submit the artifact to the marketplace.
  2. Forwarding the consumer's request to Swarm Technologies' or a third party Ambassador that may have record of the results.

Hunt Lifecycle

Providing hunt capabilities to consumers requires:

  1. An archive of submitted artifacts.
  2. A scalable and economically sound infrastructure for evaluating consumer-uploaded rules against your archive.

The hunt lifecycle at a conceptual level:

  1. The consumer submits hunt criteria (e.g. a YARA rule) to the Ambassador.
  2. The Ambassador invokes a search process for the criteria.
  3. The Ambassador returns results to the consumer.

As an Ambassador, it's largely up to you to define how (or whether) you'd like to offer hunting functionality to your consumers. Swarm Technologies' Ambassador will initially support YARA rule scanning. We encourage others to support the same.


Developing Your Ambassador

Hallmarks of a Successful Ambassador

As an Ambassador, you'll be your consumer's interface to the PolySwarm marketplace. It's important that you strive to provide a service that is:

  • easy to use
  • scalable
  • low latency
  • high throughput
  • cost effective

Developing an Ambassador

Windows-based Ambassadors are not supported; we strongly recommend developing Ambassadors under Linux.

Prerequisites

Configure your Linux-based development environment.

Building on polyswarm-client

polyswarm-client provides a convenient basis for Ambassador development by abstracting polyswarmd API complexities and providing ready-to-use Ambassador examples. By building on top of polyswarm-client, you won't have to worry about maintaining polyswarmd API compatibility, freeing time to focus on your Ambassador's differentiating features and developing your business logic. This tutorial will build on polyswarm-client and use the examples provided therein.

If you'd like to build an Ambassador from scratch, we encourage you to complete this tutorial first and then consult the polyswarmd API for a description of interfaces your Ambassador must support. It will be your responsibility to track polyswarmd releases and update your API interface as necessarily to ensure uninterrupted service to your customers.

The below sections break down the example Ambassadors available in polyswarm-client master as of commit 3c2e432289276f69be96db4e8eb587a997900af9.

All example Ambassadors assume a 1:1 relationship between Ambassadors and communities. As an Ambassador developer, you may want to interface with multiple communities. This and other real-world concerns are covered in a subsequent section.

Example: EICAR-Submitting Ambassador

EICAR is a test file used by the Antivirus industry to test engines' ability to detect malware. The file is not malicious, but is flagged as such by many antivirus engines.

Here we'll discuss submitting EICAR as a PolySwarm Ambassador. Elsewhere, we discuss building a Microengine that detects EICAR.

polyswarm-client comes with an EICAR-submitting Ambassador (eicar.py) that hosts artifacts on IPFS, a public, distributed file sharing network.

eicar.py begins:

import base64
import logging
import random
import os

from concurrent.futures import CancelledError
from polyswarmclient.abstractAmbassador import AbstractAmbassador

logger = logging.getLogger(__name__)

EICAR = base64.b64decode(
    b'WDVPIVAlQEFQWzRcUFpYNTQoUF4pN0NDKTd9JEVJQ0FSLVNUQU5EQVJELUFOVElWSVJVUy1URVNULUZJTEUhJEgrSCo=')
NOT_EICAR = 'this is not malicious'
ARTIFACTS = [('eicar', EICAR), ('not_eicar', NOT_EICAR)]

After some imports and logging configuration, the EICAR string and a string that is decidedly not EICAR are hard-coded. These strings are placed in an ARTIFACTS array.

Continuing:

BOUNTY_TEST_DURATION_BLOCKS = int(os.getenv('BOUNTY_TEST_DURATION_BLOCKS', 5))

eicar.py sets the default assertion duration window to 5 blocks. Block duration in wall time is decided by those hosting the community. In Swarm Technologies-hosted communities, 1 block is added to the chain approximately every second, so a 5 block window is approximately 5 seconds. This default can be overridden with an environment variable.

class Ambassador(AbstractAmbassador):
    """Ambassador which submits the EICAR test file"""

    def __init__(self, client, testing=0, chains=None, watchdog=0, submission_rate=30):
        """
        Initialize {{ cookiecutter.participant_name }}
        Args:
            client (`Client`): Client to use
            testing (int): How many test bounties to respond to
            chains (set[str]): Chain(s) to operate on
            watchdog: interval over which a watchdog thread should verify bounty placement on-chain (in number of blocks)
            submission_rate: if nonzero, produce a sleep in the main event loop to prevent the Ambassador from overloading `polyswarmd` during testing
        """
        init_logging([__name__], log_format='json')
        super().__init__(client, testing, chains, watchdog, submission_rate)

eicar.py's Ambassader is built on polyswarm-client's AbstractAmbassador. Among many other things, AbstractAmbassador establishes a connection to a hosted polyswarmd and manages this connection through an instance variable named client.

AbstractAmbassador declares a single method, generate_bounties, as abstract. All subclasses of AbstractAmbassador must define this method.

As you might suspect, eicar.py's implementation of this method is rather straightforward:

    async def generate_bounties(self, chain):
        """Submit either the EICAR test string or a benign sample

        Args:
            chain (str): Chain sample is being requested from
        """
        amount = await self.client.bounties.parameters[chain].get('bounty_amount_minimum')

        while True:
            try:
                filename, content = random.choice(ARTIFACTS)

                logger.info('Submitting %s', filename)
                ipfs_uri = await self.client.post_artifacts([(filename, content)])
                if not ipfs_uri:
                    logger.error('Error uploading artifact to IPFS, continuing')
                    continue

                await self.push_bounty(amount, ipfs_uri, BOUNTY_TEST_DURATION_BLOCKS, chain)
            except CancelledError:
                logger.warning('Cancel requested')
                break
            except Exception:
                logger.exception('Exception in bounty generation task, continuing')
                continue

The method does the following (modulo error checking):

  1. queries polyswarmd for the minimum initial bounty amount
  2. enters an infinite loop
  3. randomly chooses either the eicar or the not_eicar string as the artifact
  4. tells polyswarmd to host the artifact on IPFS
  5. instructs polyswarmd to post the bounty, specifying the initial bounty amount (the minimum allowed), the URI of the artifact, the assertion window duration and the chain*

*chain refers to which blockchain to post the bounty on: "homechain" or "sidechain". This argument should always be side; it will be removed in a future polyswarm-client release.

Notes:

  1. polyswarm-client-derived Ambassadors are multi-threaded by default handling events asynchronously. This infinite loop will be isolated to the thread responsible for posting bounties; the remainder of the Ambassador will function normally.
  2. There is no explicit sleep in the loop. This is intentional; the thread responsible for generate_bounties effectively sleeps while blocking on bounty submission each time it calls self.client.post_artifacts (blocking on IPFS host) and self.client.push_bounty (blocking on the announcement of the bounty in the marketplace by polyswarmd).

eicar.py is a trivial example that does not account for many real-world Ambassador operating concerns. Next, we'll expand on this example to an Ambassador that submits on-disk artifacts.

Example: "Filesystem" Ambassador

polyswarm-client's filesystem.py Ambassador expands on the eicar.py Ambassador, submitting artifacts from a local filesystem.

It begins in a similar manner:

import logging
import random
import os

from concurrent.futures import CancelledError
from polyswarmclient.abstractAmbassador import AbstractAmbassador
from polyswarmclient.corpus import DownloadToFileSystemCorpus

logger = logging.getLogger(__name__)

ARTIFACT_DIRECTORY = os.getenv('ARTIFACT_DIRECTORY', 'docker/artifacts')
ARTIFACT_BLACKLIST = os.getenv('ARTIFACT_BLACKLIST', 'truth.db').split(',')
BOUNTY_TEST_DURATION_BLOCKS = int(os.getenv('BOUNTY_TEST_DURATION_BLOCKS', 5))

Again, imports are handled, the bounty duration is hard-coded and logging is configured. filesystem.py makes use of polyswarmclient.corpus, a helper class that will download, decrypt and extract an artifact collection. Swarm Technologies uses this class internally during continuous integration to ensure that legitimately malicious artifacts are detected as such by microengines.

Continuing:

class Ambassador(AbstractAmbassador):
    """Ambassador which submits artifacts from a directory"""

    def __init__(self, client, testing=0, chains=None, watchdog=0, submission_rate=30):
        """Initialize a filesystem Ambassador
        Args:
            client (`Client`): Client to use
            testing (int): How many test bounties to respond to
            chains (set[str]): Chain(s) to operate on
        """
        init_logging([__name__], log_format='json')
        super().__init__(client, testing, chains, watchdog, submission_rate)

filesystem.py makes use of several more arguments to the AbstractAmbassador class that are useful for testing:

  • testing: when nonzero, this parameter specifies the maximum number of bounties the Ambassador will generate before exiting.
  • watchdog: a block interval. Bounties placed by this Ambassador are checked against each new block to ensure that the bounty has been successfully placed on-chain.
  • submission_rate: if nonzero, this produces a sleep in the main event loop to prevent the Ambassador from overloading polyswarmd during testing.

Moving on:

        self.artifacts = []
        u = os.getenv("MALICIOUS_BOOTSTRAP_URL")
        if u:
            logger.info("Unpacking malware corpus at {0}".format(u))
            d = DownloadToFileSystemCorpus()
            d.download_and_unpack()
            bfl = d.get_benign_file_list()
            mfl = d.get_malicious_file_list()
            logger.info("Unpacking complete, {0} malicious and {1} benign files".format(len(mfl), len(bfl)))
            self.artifacts = bfl + mfl
        else:
            for root, dirs, files in os.walk(ARTIFACT_DIRECTORY):
                for f in files:
                    self.artifacts.append(os.path.join(root, f))

If the environment variable MALICIOUS_BOOTSTRAP_URL is set, the Ambassador downloads artifacts from a testing repository. If it's not set, ARTIFACT_DIRECTORY directory is walked relative to the Ambassador's current working directory. Files are gathered in preparation for bounty generation.

filesystem.py overrides the generate_bounties method:

    async def generate_bounties(self, chain):
        """Submit bounty from the filesystem
        Args:
            chain (str): Chain sample is being requested from
        """
        amount = await self.client.bounties.parameters[chain].get('bounty_amount_minimum')

        while True:
            try:
                filename = random.choice(self.artifacts)

                logger.info('Submitting file %s', filename)
                ipfs_uri = await self.client.post_artifacts([(filename, None)])
                if not ipfs_uri:
                    logger.error('Error uploading artifact to IPFS, continuing')
                    continue

                await self.push_bounty(amount, ipfs_uri, BOUNTY_TEST_DURATION_BLOCKS, chain)
            except CancelledError:
                logger.warning('Cancel requested')
                break
            except Exception:
                logger.exception('Exception in bounty generation task, continuing')
                continue

This is identical to the logic contained withing eicar.py, refer to the previous section for a breakdown.

filesystem.py builds on eicar.py by building an artifact from on-disk and, optionally, a remote URL. In a real-world Ambassador, these artifacts would come from the consumer's submissions to the Ambassador.

Building an Ambassador Using participant-template

The easiest way to get started is to build an Ambassador using participant-template. By using the template, your Ambassador will be based on polyswarm-client, allowing you to focus on business logic.

We're going to cut our microengine from participant-template. To do this, we'll need cookiecutter:

pip install cookiecutter

With cookiecutter installed, jump-starting your microengine from our participant-template is as easy as:

cookiecutter https://github.com/polyswarm/participant-template

And answering some prompts. Read about these prompts here.

Choose:

  • participant_type: Ambassador
  • platform: linux (Windows Ambassadors are not supported)
  • participant_name: helloworld
  • accept the remaining defaults

You'll be left with an Ambassador-helloworld directory. Change directory (cd) into Ambassador-helloworld:

$ cd Ambassador-helloworld

Customize Your Ambassador

Here we'll implement a simple, minimum viable Ambassador, re-creating the EICAR Ambassador described above.

Implementing a minimum viable Ambassador is as simple as implementing your Ambassador's generate_bounties method. This method is found in Ambassador_<participant_name_slug>/src/<author_org_slug>_<participant_name_slug>/__init__.py (Ambassador_helloworld/src/polyswarm_helloworld/__init__.py if you followed the cookiecutter prompts as described above). Production Ambassadors will, of course, need to do far more than this.

Open Ambassador_helloworld/src/polyswarm_helloworld/__init__.py.

Customize the file to include the EICAR and not-EICAR definitions we saw in the EICAR Ambassador:

...
logger = logging.getLogger(__name__)

EICAR = base64.b64decode(
    b'WDVPIVAlQEFQWzRcUFpYNTQoUF4pN0NDKTd9JEVJQ0FSLVNUQU5EQVJELUFOVElWSVJVUy1URVNULUZJTEUhJEgrSCo=')
NOT_EICAR = 'this is not malicious'
ARTIFACTS = [('eicar', EICAR), ('not_eicar', NOT_EICAR)]

BOUNTY_TEST_DURATION_BLOCKS = int(os.getenv('BOUNTY_TEST_DURATION_BLOCKS', 5))
...

Then, customize the generate_bounty method to submit EICAR and not-EICAR:

    async def generate_bounties(self, chain):
        """Submit either the EICAR test string or a benign sample

        Args:
            chain (str): Chain sample is being requested from
        """
        amount = await self.client.bounties.parameters[chain].get('bounty_amount_minimum')

        while True:
            try:
                filename, content = random.choice(ARTIFACTS)

                logger.info('Submitting %s', filename)
                ipfs_uri = await self.client.post_artifacts([(filename, content)])
                if not ipfs_uri:
                    logger.error('Error uploading artifact to IPFS, continuing')
                    continue

                await self.push_bounty(amount, ipfs_uri, BOUNTY_TEST_DURATION_BLOCKS, chain)
            except CancelledError:
                logger.warning('Cancel requested')
                break
            except Exception:
                logger.exception('Exception in bounty generation task, continuing')
                continue

Once these changes are made, you now have an EICAR Ambassador built on participant-template!

Next, we'll consider some real-world concerns that go beyond the current scope of this document and then test our EICAR-submitting Ambassador.


Production Ambassador Considerations

The eicar.py and filesystem.py Ambassadors are proof of concepts that do not address many requirements desirable of production Ambassadors, including, but not limited to:

  1. a consumer-facing API*
  2. the ability to speak to multiple communities
  3. a means to track consumer requests for e.g. rate limiting and billing
  4. a scalable infrastructure that adjusts based on demand, ensuring low latency and high throughput
  5. a scalable archive of past queries and results upon which hash search, hunts and other functionality many be built

Ready to build your ambassador and serve as your clients' window into the PolySwarm marketplace?

I want to build an Ambassador →

Ambassadors are only supported under Linux.