PolySwarmPolySwarmPolySwarmPolySwarm
Help

Developing a Production Microengine

作为微引擎开发人员,监控您的微引擎的性能,并磨练您的 NCT 存储策略,以反映您对 PolySwarm 市场中看到的置信度水平非常重要。 过滤你的微引擎所断言的工件,并保持您对微引擎断言的的投注额度与置信度之间的强烈相关性,对于保持微引擎有效(和盈利)至关重要。

All microengine developers should routinely:

  1. 监控微引擎的市场表现。
  2. 设计收官“会审”过滤器,快速识别感兴趣的工件。
  3. 在您的微引擎的断言置信度和投注 NCT 的之间保持紧密的相关性。
  4. Ensure their microengine is capable of scaling in response to demand.

Monitoring Your Microengine's Marketplace Performance

Any PolySwarm user can track any microengine's profit / loss from PolySwarm Web's Microengines page. 作为微引擎开发人员,您需要:

  1. Create an account on PolySwarm Web
  2. Track your microengines' performance in comparison to other microengines' performance and
  3. Claim ownership in your microengine(s) so you can more easily track your microengines' performance

要求您的微引擎的所有权

When you claim (and prove) ownership of your Microengines, you're able to:

  1. (Optionally) name your microengine
  2. (Coming Soon) assign ownership to a Team
  3. View your microengines' performance in a single view without searching for each microengine in the unfiltered Microengines listing

We encourage every microengine developer to claim all of their Microengines.

We're continually rolling out new features that extend the management capabilities of owned Microengines.

Taking ownership of your microengine is a necessary first step for admission to PolySwarm's various Private Communities, unlocking private and often higher-value bounties.

跟踪性能

With your Microengines claimed, quickly view profit / loss information across your microengine footprint using My Microengines. When deploying new detections to your microengines, use My Microengines to get a sense for their economic impact. 经常查看这些图表,以便快速识别微引擎出现的问题(例如盈利能力急剧下降)。

关闭循环

能够观察性能不佳的微引擎,微引擎开发人员就能够将性能不佳的微引擎脱机,阻止损失。 不过,如果没有更多的背景信息,在将微引擎重新上线之前,确定如何改进微引擎可能有困难。 因此,我们强烈建议所有微引擎将其评估的工件以及其对任何给定工件的断言和 NCT 投注存档

我们正在设计簿记功能,将此变成一个主动的过程,帮助微引擎开发人员更快地关闭微引擎改进的循环。 As they become available, these features will be accessible on the My Microengines page.


Triaging Artifacts

我们建议微引擎开发人员分两个阶段构建其微引擎:

  1. 一个非常快速、轻量级的会审过程,决定了工件是否值得全面调查
  2. 对工件的全面审查,确定是否恶意,并在断言时间窗口内对赏金作出回应

通过实施会审通关,微引擎开发人员可以节省时间和金钱,减轻执行负担,并快速忽略不感兴趣的工件。 根据与微引擎提供者的对话,一种流行的会审策略是根据文件类型筛选工件。

Below is a simple example of a triage pass in a Microengine's scan() function:

import magic
...
class Scanner(AbstractScanner):

  ...

  async def scan(self, guid, artifact_type, content, chain):

    # Reject artifacts that aren't files
    if not artifact_type == ArtifactType.FILE:
      return ScanResult()

    # Reject files that libmagic does not identify as an ELF or PE by returning an empty ScanResult object
    if not ((magic.from_buffer(content)[0:3] == "ELF") or (magic.from_buffer(content)[0:2] == "PE")):
      return ScanResult()
...

Developing an Effective Staking Strategy

Example Lifecycle

Let's run through a simplified example of a bounty lifecycle, noting the impact of staking strategy design.

悬赏创建

Suppose the fictitious ACME Enterprises discovers something suspicious on their network and wants to enrich their telemetry with intelligence produced by the PolySwarm marketplace. ACME submits an artifact to the PolySwarm marketplace via PolySwarm Web, PolySwarm API or a third party Ambassador.

An Ambassador creates a bounty for ACME's submission. This bounty contains: (1) the artifact, and (2) a configurable amount of NCT into the initial reward bucket for the bounty. 为了说明,让我们假定,向奖励桶中放入了 5 NCT。

向活动 PolySwarm 微引擎通知了这一新悬赏和放在工件上的 NCT 量。

Let's assume 8 microengines* find the initial NCT reward placed by the Ambassador to be sufficient for triaging of the artifact to determine whether the artifact falls within the the microengine's area of expertise.

*The number of active PolySwarm microengines is far beyond 8 and is growing by the day, but we'll keep this example simple for illustrative purposes.

微引擎执行会审通关

微引擎进行首关会审,并确定:

  • 微引擎 A、B、C、D、E:工件在其专门知识领域内
  • 微引擎 F、G、H:工件在其专业领域之外

微引擎 F、G 和 H 忽略了赏金,,选择不回应,而微引擎 A-E 则更仔细地观察。

微引擎执行全面分析

在分析过程中,每个微型引擎都确定了帮助它们得出结论的关键特征(高置信度指标)和/或一般模式(低置信度指标)。 这些微引擎以其投注的 NCT 量,表现其置信度,得出以下结论:

  • 微引擎 A:1 NCT/恶意
  • 微引擎 B:1 NCT/良性
  • 微引擎 C:2 NCT/恶意
  • 微引擎 D:1 NCT/恶意
  • 微引擎 E:2 NCT/良性

大致来说,微引擎 C 和 E 对其断言的置信度是同意它们的微引擎的两倍。

These assertions and their NCT stakes are sent to the Ambassador immediately after the assertion window closes. 代表分析这些断言,并可选择将它们合并为一条完整的情报,以便传递到 ACME。

Total NCT is Computed by PolySwarm's BountyRegistry Contract

初始奖励加上投注额被保管到 PolySwarm 的 BountyRegistry 合约中。 所有资金汇总:

  • 初始奖励:5 NCT +
  • 微引擎 A:1 NCT +
  • 微引擎 B:1 NCT +
  • 微引擎 C:2 NCT +
  • 微引擎 D:1 NCT +
  • 微引擎 E:2 NCT =
  • 总奖励:12 NCT

仲裁人确定真正事实

之后,仲裁者衡量现有工件的真正事实。 仲裁人确定,事实上,该工件是恶意的

然后,该悬赏被打开,以便正确断言的微引擎就奖励提出主张。 微引擎的总奖金与他们的投注成正比:

  • 微引擎 A:3 NCT/(2 NCT 利润)
  • 微引擎 B:0 NCT/(断言不正确)
  • 微引擎 C:6 NCT/(4 NCT 利润)
  • 微引擎 D:3 NCT/(2 NCT 利润)
  • 微引擎 E:0 NCT/(断言不正确)

微引擎 C 是最大的赢家。 C 将 A 和 D 的投注额翻倍,获得的两倍比例的奖励。

此示例是 PolysSwarm 市场实际情况的简化版。

在实际市场中,更多的微引擎会做出反应,投注金额不必是整数,市场将评估费用以补偿仲裁人。

保持置信度和 NCT 投注额之间的紧密关联

当微引擎以其 NCT 投注量,表现其对断言的置信度时,各方均受益。 一方面,关于工件的恶意性和微引擎 NCT 投注额,向代表提供了另外一个维度的信号。 在市场的另一面,根据置信度调节 NCT 投注的微引擎有可能增加利润。

The best microengines will have a good sense of their confidence in each scan and will deliver a "confidence interval" between 0.0 and 1.0 while returning scan results via the ScanResult object. These confidences are used in AbstractMicroengine's bid() method according to a BidStrategy.

polyswarm-client provides a default bid strategy via the class BidStrategyBase. Variants of this default strategy (with different weights applied) can be found in polyswarm-client.

You may use the default bid strategy, some variant thereof, or develop a fully custom bid strategy by subclassing BidStrategyBase. participant-template will produce a BidStrategy class for Microengine participants. Refer to the comments surrounding this subclass for more information.

During testing, it may be convenient to quickly swap bid strategies. You can choose which strategy to use when you launch your microengine by providing the --bid-strategy command line option or setting the BID_STRATEGY variable in your environment.

Let's take a look at the default bid strategy in BidStrategyBase's bid() method:

async def bid(self, guid, mask, verdicts, confidences, metadatas, chain):
  """Override this to implement custom bid calculation logic
  Args:
      guid (str): GUID of the bounty under analysis, use to correlate with artifacts in the same bounty
      masks (list[bool]): mask for the from scanning the bounty files
      verdicts (list[bool]): scan verdicts from scanning the bounty files
      confidences (list[float]): Measure of confidence of verdict per artifact ranging from 0.0 to 1.0
      metadatas (list[str]): metadata blurbs from scanning the bounty files
      chain (str): Chain we are operating on
  Returns:
      int: Amount of NCT to bid in base NCT units (10 ^ -18)
  """
  min_allowed_bid = await self.client.bounties.parameters[chain].get('assertion_bid_minimum')
  min_bid = max(self.min_bid, min_allowed_bid)
  max_bid = max(self.max_bid, min_allowed_bid)

  asserted_confidences = [c for b, c in zip(mask, confidences) if b]
  avg_confidence = sum(asserted_confidences) / len(asserted_confidences)
  bid = int(min_bid + ((max_bid - min_bid) * avg_confidence))

  # Clamp bid between min_bid and max_bid
  return max(min_bid, min(bid, max_bid))

Currently, only one NCT stake may be placed per bounty. This presents a problem for multi-artifact bounties: How can a single NCT stake accurately convey potentially variable confidence across multiple bounties?

We're working to remove this limitation. Future releases will support N stakes for N artifacts in a single bounty.

Currently, staking strategies take an average confidence over all artifacts in a given bounty to arrive at a single NCT stake amount for that bounty.

确定置信度

The specifics of determining confidence cannot easily be generalized; each microengine will have an optimal strategy. Generally, we've found that microengine developers are choosing one of several strategies (in order of increasing efficacy):

  1. 无置信度可以派生 — 微引擎对所有响应都同样有信心。 这是最不理想的策略,通常表现为对每个工件投注的静态 NCT 额。 我们正在与这些微引擎开发人员在 Discord 上开发更好的投注策略,我们也很乐意为您提供帮助!
  2. 基于文件类型的置信度: 一些微引擎使用文件类型信息两次:一次是在其会审过程中排除工件,另一次是为通过会审过程的文件分配权重。 这很简单,只需为每个受支持的文件类型分配一个静态权重,这些文件类型影响从 Scanner 类传递的置信度分数。 这类策略更好,但仍然不理想。
  3. 基于具体指标的置信度。 这应该是所有性能良好的微引擎的目标。 PolySwarm 市场上现有几个微引擎可以达到此目的,例如,通过剖析 Microengines Word 文档和识别已知的不良自动执行宏脚本。 这种工件询问最佳 — 它提供了非常高的置信度信号,这些微引擎开发员将因此而能够制定最佳投注策略。
import magic
...
class Scanner(AbstractScanner):

  ...

  async def scan(self, guid, artifact_type, content, chain):

    confidence_delta = 0

    # Increase confidence score for ELF and PE files
    if not ((magic.from_buffer(content)[0:3] == "ELF") or (magic.from_buffer(content)[0:2] == "PE")):
      confidence_delta += 0.2

    ...

    # Conduct the full analysis, arriving at a base confidence score
    confidence_base = do_analysis()

    ...

    # Take file-based confidence into account when returning result
    return ScanResult(bit=True, verdict=True, confidence=confidence_base+confidence_delta)
...

Operating at Scale

As more enterprises rely on PolySwarm for scanning suspect artifacts, microengines need to scale in order to meet demand.

Microengines built using participant-template use a producer / consumer model* for horizontal scaling:

  1. 1 frontend (producer): responsible for communicating with the PolySwarm marketplace: ingesting bounties, triaging artifacts, producing pub/sub scan events for backends, implementing a staking strategy and posting assertions. The frontend translates marketplace bounties into events on a pub/sub queue for backends to consume and distills responses from backends into marketplace actions.
  2. N backends (consumers): the actual scanners that process artifacts and produce assertions (malicious / benign) coupled with confidence ratings. These backends are tasked by the frontend.

Microengines created with participant-template prior to June 18th 2019 will need to be upgraded to the producer / consumer model.

The producer / consumer model makes it simple to horizontally scale your microengine. As demand increases, launch additional consumer instances. As demand decreases, it's safe to retire some consumer instances. In other words, microengine's resource footprint should scale elastically in response to demand.

Relative to a traditional monolithic model, producer / consumer provides additional benefits:

  1. The producer houses bid / staking logic, disjoint from consumer-held scanning logic. This separation provides a more maintainable and secure architecture: consumers, responsible for complex scanning functions, need not (and should not) have access to the microengine's wallet. All wallet-related functions can be handled by the comparatively simple producer component.
  2. The pub/sub interface permits parallel scans by design without complex (or even explicit) client synchronization.
  3. Makes it easier to build disjoint, multi-backend microengines. It becomes possible to mix and match lighter weight (e.g. static analysis) and heavier weight (e.g. sandbox) backends, with the single producer frontend mediating scan results, allowing the microengine to respond as best it can within the assertion timeframe.
  4. Reduces computational cost via elastic resource consumption.

Next Steps

With a staking strategy in place, it's time to connect your microengine to the PolySwarm marketplace!

Connect to the PolySwarm marketplace →