PolySwarmPolySwarmPolySwarmPolySwarm
Help

Developing a Production Microengine

作為微引擎開發人員,監控您的微引擎的性能,並磨練您的 NCT 存儲策略,以反映您對 PolySwarm 市場中看到的置信度水平非常重要。 過濾你的微引擎所斷言的工件,並保持您對微引擎斷言的的投注額度與置信度之間的強烈相關性,對於保持微引擎有效(和盈利)至關重要。

All microengine developers should routinely:

  1. 監控微引擎的市場表現。
  2. 設計收官“會審”過濾器,快速識別感興趣的工件。
  3. 在您的微引擎的斷言置信度和投注 NCT 的之間保持緊密的相關性。
  4. Ensure their microengine is capable of scaling in response to demand.

Monitoring Your Microengine's Marketplace Performance

Any PolySwarm user can track any microengine's profit / loss from PolySwarm Web's Microengines page. 作為微引擎開發人員,您需要:

  1. Create an account on PolySwarm Web
  2. Track your microengines' performance in comparison to other microengines' performance and
  3. Claim ownership in your microengine(s) so you can more easily track your microengines' performance

要求您的微引擎的所有權

When you claim (and prove) ownership of your Microengines, you're able to:

  1. (Optionally) name your microengine
  2. (Coming Soon) assign ownership to a Team
  3. View your microengines' performance in a single view without searching for each microengine in the unfiltered Microengines listing

We encourage every microengine developer to claim all of their Microengines.

We're continually rolling out new features that extend the management capabilities of owned Microengines.

Taking ownership of your microengine is a necessary first step for admission to PolySwarm's various Private Communities, unlocking private and often higher-value bounties.

跟踪性能

With your Microengines claimed, quickly view profit / loss information across your microengine footprint using My Microengines. When deploying new detections to your microengines, use My Microengines to get a sense for their economic impact. 經常查看這些圖表,以便快速識別微引擎出現的問題(例如盈利能力急劇下降)。

關閉循環

能夠觀察性能不佳的微引擎,微引擎開發人員就能夠將性能不佳的微引擎脫機,阻止損失。 不過,如果沒有更多的背景信息,在將微引擎重新上線之前,確定如何改進微引擎可能有困難。 因此,我們強烈建議所有微引擎將其評估的工件以及其對任何給定工件的斷言和 NCT 投注存檔

我們正在設計簿記功能,將此變成一個主動的過程,幫助微引擎開發人員更快地關閉微引擎改進的循環。 As they become available, these features will be accessible on the My Microengines page.


Triaging Artifacts

我們建議微引擎開發人員分兩個階段構建其微引擎:

  1. 一個非常快速、輕量級的會審過程,決定了工件是否值得全面調查
  2. 對工件的全面審查,確定是否惡意,並在斷言時間窗口內對賞金作出回應

通過實施會審通關,微引擎開發人員可以節省時間和金錢,減輕執行負擔,并快速忽略不感興趣的工件。 根據與微引擎提供者的對話,一種流行的會審策略是根據文件類型篩選工件。

Below is a simple example of a triage pass in a Microengine's scan() function:

import magic
...
class Scanner(AbstractScanner):

  ...

  async def scan(self, guid, artifact_type, content, chain):

    # Reject artifacts that aren't files
    if not artifact_type == ArtifactType.FILE:
      return ScanResult()

    # Reject files that libmagic does not identify as an ELF or PE by returning an empty ScanResult object
    if not ((magic.from_buffer(content)[0:3] == "ELF") or (magic.from_buffer(content)[0:2] == "PE")):
      return ScanResult()
...

Developing an Effective Staking Strategy

Example Lifecycle

Let's run through a simplified example of a bounty lifecycle, noting the impact of staking strategy design.

懸賞創建

Suppose the fictitious ACME Enterprises discovers something suspicious on their network and wants to enrich their telemetry with intelligence produced by the PolySwarm marketplace. ACME submits an artifact to the PolySwarm marketplace via PolySwarm Web, PolySwarm API or a third party Ambassador.

An Ambassador creates a bounty for ACME's submission. This bounty contains: (1) the artifact, and (2) a configurable amount of NCT into the initial reward bucket for the bounty. 為了說明,讓我們假定,向獎勵桶中放入了 5 NCT。

向活動 PolySwarm 微引擎通知了這一新懸賞和放在工件上的 NCT 量。

Let's assume 8 microengines* find the initial NCT reward placed by the Ambassador to be sufficient for triaging of the artifact to determine whether the artifact falls within the the microengine's area of expertise.

*The number of active PolySwarm microengines is far beyond 8 and is growing by the day, but we'll keep this example simple for illustrative purposes.

微引擎執行會審通關

微引擎進行首關會審,並確定:

  • 微引擎 A、B、C、D、E:工件在其專門知識領域內
  • 微引擎 F、G、H:工件在其專業領域之外

微引擎 F、G 和 H 忽略了賞金,選擇不回應,而微引擎 A-E 則更仔細地觀察。

微引擎執行全面分析

在分析過程中,每個微型引擎都確定了幫助它們得出結論的關鍵特徵(高置信度指標)和/或一般模式(低置信度指標)。 這些微引擎以其投注的 NCT 量,表現其置信度,得出以下結論:

  • 微引擎 A:1 NCT/惡意
  • 微引擎 B:1 NCT/非惡意
  • 微引擎 C:2 NCT/惡意
  • 微引擎 D:1 NCT/惡意
  • 微引擎 E:2 NCT/非惡意

大致來說, 微引擎 C 和 E 對它們的斷言的信心是同意它們的微引擎的兩倍。

These assertions and their NCT stakes are sent to the Ambassador immediately after the assertion window closes. 代表分析這些斷言,並可選擇將它們合併為一條完整的情報,以便傳遞到 ACME。

Total NCT is Computed by PolySwarm's BountyRegistry Contract

初始獎勵加上投注額被保管到 PolySwarm 的 BountyRegistry 合約中。 所有資金匯總:

  • 初始獎勵:5 NCT +
  • 微引擎 A:1 NCT +
  • 微引擎 B:1 NCT +
  • 微引擎 C:2 NCT +
  • 微引擎 D:1 NCT +
  • 微引擎 E:2 NCT =
  • 總獎勵:12 NCT

仲裁人確定真正事實

之後,仲裁者衡量現有工件的真正事實。 仲裁人確定,事實上該工件是惡意的

然後,該懸賞被打開,以便正確斷言的微引擎就獎勵提出主張。 微引擎的總獎金與他們的投注成正比:

  • 微引擎 A:3 NCT/(2 NCT 利潤)
  • 微引擎 B:0 NCT/(斷言不正確)
  • 微引擎 C:6 NCT/(4 NCT 利潤)
  • 微引擎 D:3 NCT/(2 NCT 利潤)
  • 微引擎 E:0 NCT/(斷言不正確)

微引擎 C 是最大的贏家。 C 將 A 和 D 的投注額翻倍,獲得的兩倍比例的獎勵。

此示例是 PolysSwarm 市場實際情況的簡化版。

在實際市場中,更多的微引擎會做出反應,投注金額不必是整數,市場將評估費用以補償仲裁人。

保持置信度和 NCT 投注額之間的緊密關聯

當微引擎以其 NCT 投注量,表現其對斷言的置信度時,各方均受益。 一方面,關於工件的惡意性和微引擎 NCT 投注額,向代表提供了另外一個維度的信號。 在市場的另一面,根據置信度調節 NCT 投注的微引擎有可能增加利潤。

The best microengines will have a good sense of their confidence in each scan and will deliver a "confidence interval" between 0.0 and 1.0 while returning scan results via the ScanResult object. These confidences are used in AbstractMicroengine's bid() method according to a BidStrategy.

polyswarm-client provides a default bid strategy via the class BidStrategyBase. Variants of this default strategy (with different weights applied) can be found in polyswarm-client.

You may use the default bid strategy, some variant thereof, or develop a fully custom bid strategy by subclassing BidStrategyBase. participant-template will produce a BidStrategy class for Microengine participants. Refer to the comments surrounding this subclass for more information.

During testing, it may be convenient to quickly swap bid strategies. You can choose which strategy to use when you launch your microengine by providing the --bid-strategy command line option or setting the BID_STRATEGY variable in your environment.

Let's take a look at the default bid strategy in BidStrategyBase's bid() method:

async def bid(self, guid, mask, verdicts, confidences, metadatas, chain):
  """Override this to implement custom bid calculation logic
  Args:
      guid (str): GUID of the bounty under analysis, use to correlate with artifacts in the same bounty
      masks (list[bool]): mask for the from scanning the bounty files
      verdicts (list[bool]): scan verdicts from scanning the bounty files
      confidences (list[float]): Measure of confidence of verdict per artifact ranging from 0.0 to 1.0
      metadatas (list[str]): metadata blurbs from scanning the bounty files
      chain (str): Chain we are operating on
  Returns:
      int: Amount of NCT to bid in base NCT units (10 ^ -18)
  """
  min_allowed_bid = await self.client.bounties.parameters[chain].get('assertion_bid_minimum')
  min_bid = max(self.min_bid, min_allowed_bid)
  max_bid = max(self.max_bid, min_allowed_bid)

  asserted_confidences = [c for b, c in zip(mask, confidences) if b]
  avg_confidence = sum(asserted_confidences) / len(asserted_confidences)
  bid = int(min_bid + ((max_bid - min_bid) * avg_confidence))

  # Clamp bid between min_bid and max_bid
  return max(min_bid, min(bid, max_bid))

Currently, only one NCT stake may be placed per bounty. This presents a problem for multi-artifact bounties: How can a single NCT stake accurately convey potentially variable confidence across multiple bounties?

We're working to remove this limitation. Future releases will support N stakes for N artifacts in a single bounty.

Currently, staking strategies take an average confidence over all artifacts in a given bounty to arrive at a single NCT stake amount for that bounty.

確定置信度

The specifics of determining confidence cannot easily be generalized; each microengine will have an optimal strategy. Generally, we've found that microengine developers are choosing one of several strategies (in order of increasing efficacy):

  1. 無置信度可以派生 — 微引擎對所有響應都同樣有信心。 這是最不理想的策略,通常表現為對每個工件投注的靜態 NCT 額。 我們正在與這些微引擎開發人員在 Discord 上開發更好的投注策略,我們也很樂意為您提供幫助!
  2. 基於文件類型的置信度: 一些微引擎使用文件類型信息兩次:一次是在其會審過程中排除工件,另一次是為通過會審過程的文件分配權重。 這很簡單,只需為每個受支持的文件類型分配一個靜態權重,這些文件類型影響從 Scanner 類傳遞的置信度分數。 這類策略更好,但仍然不理想。
  3. 基於具體指標的置信度。 這應該是所有性能良好的微引擎的目標。 PolySwarm 市場上現有幾個微引擎可以達到此目的,例如,通過剖析 Microengines Word 文檔和識別已知的不良自動執行宏腳本。 這種工件詢問最佳 — 它提供了非常高的置信度信號,這些微引擎開發員將因此而能夠制定最佳投注策略。
import magic
...
class Scanner(AbstractScanner):

  ...

  async def scan(self, guid, artifact_type, content, chain):

    confidence_delta = 0

    # Increase confidence score for ELF and PE files
    if not ((magic.from_buffer(content)[0:3] == "ELF") or (magic.from_buffer(content)[0:2] == "PE")):
      confidence_delta += 0.2

    ...

    # Conduct the full analysis, arriving at a base confidence score
    confidence_base = do_analysis()

    ...

    # Take file-based confidence into account when returning result
    return ScanResult(bit=True, verdict=True, confidence=confidence_base+confidence_delta)
...

Operating at Scale

As more enterprises rely on PolySwarm for scanning suspect artifacts, microengines need to scale in order to meet demand.

Microengines built using participant-template use a producer / consumer model* for horizontal scaling:

  1. 1 frontend (producer): responsible for communicating with the PolySwarm marketplace: ingesting bounties, triaging artifacts, producing pub/sub scan events for backends, implementing a staking strategy and posting assertions. The frontend translates marketplace bounties into events on a pub/sub queue for backends to consume and distills responses from backends into marketplace actions.
  2. N backends (consumers): the actual scanners that process artifacts and produce assertions (malicious / benign) coupled with confidence ratings. These backends are tasked by the frontend.

Microengines created with participant-template prior to June 18th 2019 will need to be upgraded to the producer / consumer model.

The producer / consumer model makes it simple to horizontally scale your microengine. As demand increases, launch additional consumer instances. As demand decreases, it's safe to retire some consumer instances. In other words, microengine's resource footprint should scale elastically in response to demand.

Relative to a traditional monolithic model, producer / consumer provides additional benefits:

  1. The producer houses bid / staking logic, disjoint from consumer-held scanning logic. This separation provides a more maintainable and secure architecture: consumers, responsible for complex scanning functions, need not (and should not) have access to the microengine's wallet. All wallet-related functions can be handled by the comparatively simple producer component.
  2. The pub/sub interface permits parallel scans by design without complex (or even explicit) client synchronization.
  3. Makes it easier to build disjoint, multi-backend microengines. It becomes possible to mix and match lighter weight (e.g. static analysis) and heavier weight (e.g. sandbox) backends, with the single producer frontend mediating scan results, allowing the microengine to respond as best it can within the assertion timeframe.
  4. Reduces computational cost via elastic resource consumption.

Next Steps

With a staking strategy in place, it's time to connect your microengine to the PolySwarm marketplace!

Connect to the PolySwarm marketplace →