# Flow Memory Inference Market

Flow Memory Inference Market is an application-record marketplace for inference capacity. It records credential references, provider-agent runtime state, bid/ask depth, quotes, reservations, route decisions, disabled challenge previews, usage analytics, and exports without moving money or requiring sensitive signing material.

This document is written for operators, provider agents, buyers, application teams, and autonomous agents. The public product name is **Flow Memory**.

## Safety boundary

All marketplace records are dry-run application records. The engine does not perform production value transfer, funds movement, broadcast, signing-material handling, live execution, or unsafe execution modes.

Every marketplace safety response must preserve these fields where the endpoint handles market safety:

| Field | Required value | Meaning |
|---|---:|---|
| `dry_run_only` | `true` | The endpoint records a simulated marketplace action. |
| `funds_moved` | `false` | No funds movement is performed. |
| `broadcast_allowed` | `false` | No value-transfer or network broadcast is allowed. |
| `private_key_required` | `false` | Sensitive signing material is not required or accepted. |
| `live_trading_enabled` | `false` | Live execution is not enabled. |
| `legal_review_required` | `true` | Legal approval is still required before any future production value-transfer path. |
| `compliance_review_required` | `true` | Compliance approval is still required before any future production value-transfer path. |

Forbidden request material:

- Raw provider credentials, bearer tokens, session secrets, or sensitive signing material.
- Value-transfer instructions, production execution flags, broadcast flags, direct execution flags, or unsafe signing payloads.
- Public docs, UI, API errors, and examples that reference another public product name.

Use `credential_ref` only. A credential reference is an opaque pointer such as `secret://inference/provider-primary`; it is created and resolved by the operator's secret-management process outside request bodies.

## Complete engine map

```mermaid
flowchart TD
    UI[API-backed Flow Memory UI] --> Credentials[Credential-ref lifecycle]
    UI --> Providers[Provider-agent runtime]
    UI --> Market[Order book and reservations]
    UI --> Routing[Quote and route policy]
    UI --> PaymentPreview[Disabled payment challenge preview]
    UI --> Analytics[Usage analytics and export]
    Credentials --> Providers
    Providers --> Catalog[Model catalog and capacity]
    Catalog --> Market
    Market --> Routing
    Routing --> Proxy[SDK-compatible proxy]
    Routing --> Orders[Dry-run order records]
    Orders --> Reservations[Dry-run reservations]
    PaymentPreview --> Audit[Audit record]
    Proxy --> Usage[Usage records]
    Usage --> Analytics
    Analytics --> Agents[AI-consumable docs and agent reports]
```

## API-backed frontend surfaces

The dashboard should be a thin API-backed client over the endpoint contract. It should not invent state, accept secrets, or bypass policy.

| UI surface | Primary endpoints | Required behavior |
|---|---|---|
| Marketplace discovery | `GET /inference/market/list`, `GET /inference/market/models/{model}`, `GET /inference/market/feed`, `GET /inference/buyers/{buyer_id}/profile`, `GET /inference/sellers/{seller_id}/profile`, `GET /inference/providers/agent/manifest` | Show public marketplace listings, model detail, feed records, profile metadata, and provider-agent manifest data. |
| Credential refs | `POST /inference/credential-refs`, `GET /inference/credential-refs`, `PATCH /inference/credential-refs/{credential_ref_id}/rotate`, `POST /inference/credential-refs/{credential_ref_id}/revoke` | Create, list, rotate, and revoke references. Never display secret values. |
| Provider runtime | `POST /inference/providers/enroll`, `POST /inference/providers/{provider_id}/pairing-token`, `POST /inference/providers/{provider_id}/heartbeat`, `POST /inference/providers/{provider_id}/catalog`, `POST /inference/providers/{provider_id}/capacity`, `GET /inference/providers/{provider_id}/health` | Enroll provider agents, pair runtimes, record heartbeats, publish catalog/capacity, and inspect passive health. |
| Market depth | `GET /inference/market/depth` | Show bid/ask depth, spread, last dry-run fill, active reservations, and safety fields. |
| Orders | `POST /inference/market/orders`, `POST /inference/market/orders/{order_id}/cancel` | Create and cancel dry-run bid/ask/marketable records. |
| Reservations | `POST /inference/market/reservations` | Hold available capacity as an expiring dry-run reservation. |
| Routing policy | `POST /inference/quote`, `POST /inference/route` | Preview compatible routes, price ceilings, discounts, fallback rules, and rejection reasons. |
| Payment preview | `POST /inference/market/payment-challenge/preview` | Show the disabled challenge/audit envelope. Must never request sensitive signing material. |
| Analytics | `GET /inference/analytics/usage`, `GET /inference/analytics/export` | Show usage, cost, savings, and NDJSON/JSON exports for agents and auditors. |
| SDK proxy | `GET /v1/models`, `POST /v1/chat/completions`, `POST /v1/responses`, `POST /v1/embeddings` | Drop-in compatible SDK entrypoints with `flow_memory` route, usage, warning, and safety metadata. |

## Credential-ref lifecycle

Credential refs are marketplace records, not secret containers. The API accepts metadata about where a credential is managed; the secret value itself stays outside Flow Memory request payloads and responses.

```mermaid
sequenceDiagram
    participant Operator
    participant FM as Flow Memory API
    participant Runtime as Provider agent
    Operator->>FM: POST /inference/credential-refs
    FM-->>Operator: credential_ref_id, credential_ref, status=active
    Operator->>FM: PATCH /inference/credential-refs/{credential_ref_id}/rotate
    FM-->>Operator: rotation record, secret_value_redacted=true
    Runtime->>FM: POST /inference/providers/enroll with credential_ref_id
    FM-->>Runtime: provider_id and pairing instructions
    Operator->>FM: POST /inference/credential-refs/{credential_ref_id}/revoke
    FM-->>Operator: status=revoked, routes gated
```

Create a credential reference:

```http
POST /inference/credential-refs
content-type: application/json
x-flow-memory-scopes: inference:sell
```

```json
{
  "owner_id": "provider-team-alpha",
  "workspace_id": "workspace-default",
  "credential_ref": "secret://inference/provider-alpha",
  "provider_api": "sdk_compatible",
  "display_name": "Provider Alpha production reference",
  "allowed_models": ["flow-memory-orchestrator"],
  "metadata": {"rotation_policy": "operator-managed"}
}
```

Expected response shape:

```json
{
  "ok": true,
  "credential_ref": {
    "credential_ref_id": "credref_provider_alpha",
    "credential_ref": "secret://inference/provider-alpha",
    "status": "active",
    "secret_value_redacted": true,
    "raw_credentials_accepted": false
  },
  "dry_run_only": true,
  "funds_moved": false,
  "broadcast_allowed": false,
  "private_key_required": false,
  "live_trading_enabled": false,
  "legal_review_required": true,
  "compliance_review_required": true
}
```

Rotate and revoke only update the reference lifecycle. They do not accept a new secret value:

```json
{
  "rotation_reason": "scheduled_rotation",
  "new_credential_ref": "secret://inference/provider-alpha-2026-07",
  "operator_ticket": "SEC-1024"
}
```

```json
{
  "revoke_reason": "provider_agent_retired",
  "disable_provider_routes": true
}
```

## Provider-agent runtime

Provider agents enroll once, pair with a short-lived token, then send heartbeats, catalog snapshots, and capacity snapshots. Flow Memory stores runtime records so the UI and routing engine can reason over availability without direct secret access.

```mermaid
sequenceDiagram
    participant Provider as Provider agent
    participant FM as Flow Memory API
    participant Market as Market engine
    Provider->>FM: POST /inference/providers/enroll
    FM-->>Provider: provider_id, runtime_status=pending_pairing
    Provider->>FM: POST /inference/providers/{provider_id}/pairing-token
    FM-->>Provider: token preview and expiration metadata
    Provider->>FM: POST /inference/providers/{provider_id}/heartbeat
    Provider->>FM: POST /inference/providers/{provider_id}/catalog
    Provider->>FM: POST /inference/providers/{provider_id}/capacity
    FM->>Market: publish runtime health, catalog, and capacity records
    Provider->>FM: GET /inference/providers/{provider_id}/health
    FM-->>Provider: passive health and route-gating state
```

Enrollment request:

```json
{
  "owner_id": "provider-team-alpha",
  "workspace_id": "workspace-default",
  "provider_name": "Provider Alpha",
  "provider_api": "sdk_compatible",
  "base_url": "https://provider-alpha.example/v1",
  "credential_ref_id": "credref_provider_alpha",
  "credential_ref": "secret://inference/provider-alpha",
  "supported_units": ["token", "request"],
  "contact": {"routing_email": "ops@example.invalid"}
}
```

Catalog request:

```json
{
  "catalog_version": "2026-06-17T00:00:00Z",
  "models": [
    {
      "model": "flow-memory-orchestrator",
      "compatible_api": "chat_responses",
      "unit_types": ["token", "request"],
      "context_window": 128000,
      "tool_calling": "metadata_only"
    },
    {
      "model": "flow-memory-message-chat",
      "compatible_api": "messages",
      "unit_types": ["request"],
      "context_window": 200000,
      "tool_calling": "metadata_only"
    }
  ]
}
```

Capacity request:

```json
{
  "capacity_version": "2026-06-17T00:05:00Z",
  "inventory": [
    {
      "model": "flow-memory-orchestrator",
      "unit_type": "token",
      "available_units": 2500000,
      "ask_unit_price": 0.00000075,
      "min_order_units": 1000,
      "quality_score": 0.98,
      "reliability_score": 0.99,
      "expires_at": "2026-12-31T00:00:00Z"
    }
  ]
}
```

Health response shape:

```json
{
  "ok": true,
  "provider_id": "provider_alpha",
  "status": "healthy",
  "credential_ref_status": "active",
  "last_heartbeat_at": "2026-06-17T00:05:00Z",
  "catalog_status": "current",
  "capacity_status": "current",
  "route_gate": "eligible",
  "dry_run_only": true,
  "funds_moved": false,
  "broadcast_allowed": false,
  "private_key_required": false,
  "live_trading_enabled": false,
  "legal_review_required": true,
  "compliance_review_required": true
}
```

## Order book mechanics

The order book records bids and asks as dry-run application orders. Orders can be created by buyers, providers, or agents, but no production value transfer occurs.

```mermaid
flowchart LR
    Ask[Provider ask] --> Book[Bid/ask order book]
    Bid[Buyer bid] --> Book
    Book --> Depth[GET /inference/market/depth]
    Book --> Match[Dry-run matching]
    Match --> Reservation[POST /inference/market/reservations]
    Match --> Fill[Simulated fill record]
    Fill --> Usage[Analytics usage]
    Fill --> Export[Analytics export]
```

Create an ask:

```json
{
  "side": "ask",
  "provider_id": "provider_alpha",
  "model": "flow-memory-orchestrator",
  "unit_type": "token",
  "units": 1000000,
  "limit_unit_price": 0.00000075,
  "currency": "USD_CREDIT",
  "min_fill_units": 1000,
  "time_in_force": "gtc",
  "metadata": {"source": "provider_capacity_snapshot"}
}
```

Create a bid:

```json
{
  "side": "bid",
  "buyer_id": "agent-research-1",
  "model": "flow-memory-orchestrator",
  "unit_type": "token",
  "units": 250000,
  "limit_unit_price": 0.0000008,
  "time_in_force": "ioc",
  "routing_policy": {
    "min_discount_bps": 500,
    "allow_fallback": true,
    "dry_run_required": true,
    "raw_credentials_allowed": false,
    "broadcast_enabled": false,
    "sensitive_signing_inputs_allowed": false
  }
}
```

Cancel an order:

```json
{
  "cancel_reason": "buyer_repriced",
  "requested_by": "agent-research-1"
}
```

Depth response shape:

```json
{
  "ok": true,
  "model": "flow-memory-orchestrator",
  "unit_type": "token",
  "bids": [{"unit_price": 0.0000008, "available_units": 250000, "order_count": 1}],
  "asks": [{"unit_price": 0.00000075, "available_units": 1000000, "order_count": 1}],
  "spread": {"best_bid": 0.0000008, "best_ask": 0.00000075, "crossed": true},
  "recent_fills": [],
  "dry_run_only": true,
  "funds_moved": false,
  "broadcast_allowed": false,
  "private_key_required": false,
  "live_trading_enabled": false,
  "legal_review_required": true,
  "compliance_review_required": true
}
```

## Reservations

Reservations hold capacity for a bounded interval. They are dry-run holds and do not create payment obligations.

```json
{
  "buyer_id": "agent-research-1",
  "provider_id": "provider_alpha",
  "model": "flow-memory-orchestrator",
  "unit_type": "token",
  "requested_units": 250000,
  "max_unit_price": 0.0000008,
  "hold_seconds": 300,
  "allow_partial": true,
  "idempotency_key": "agent-research-1-task-42"
}
```

A successful response should include `reservation_id`, `held_units`, `expires_at`, `matched_orders`, `dry_run_only=true`, and `funds_moved=false`.

## Routing policy

Use quote and route calls before creating orders or sending SDK traffic. The route policy narrows model compatibility, price, discount, fallback, credential status, and health.

```mermaid
sequenceDiagram
    participant Agent
    participant FM as Flow Memory API
    participant Book as Market depth
    participant Proxy as SDK proxy
    Agent->>FM: GET /inference/market/depth
    Agent->>FM: POST /inference/quote with policy
    FM->>Book: evaluate asks, bids, capacity, health
    Book-->>FM: eligible quotes and rejected routes
    Agent->>FM: POST /inference/route
    FM-->>Agent: selected_quote, rejected_routes, warnings
    Agent->>Proxy: SDK-compatible request
    Proxy-->>Agent: response with flow_memory metadata
```

Policy request:

```json
{
  "agent_id": "agent-research-1",
  "workspace_id": "workspace-default",
  "task_id": "task-42",
  "model": "flow-memory-orchestrator",
  "unit_type": "token",
  "estimated_units": 250000,
  "market_policy": {
    "allowed_models": ["flow-memory-orchestrator"],
    "max_unit_price": 0.0000008,
    "min_discount_bps": 500,
    "allow_fallback": true,
    "allow_external_providers": true,
    "require_healthy_provider": true,
    "dry_run_required": true,
    "raw_credentials_allowed": false,
    "broadcast_enabled": false,
    "sensitive_signing_inputs_allowed": false
  }
}
```

Common rejection codes:

| Code | Meaning | Safe next action |
|---|---|---|
| `source_disabled` | Provider or route is disabled. | Pick another route or wait for operator repair. |
| `credential_ref_unresolved` | Reference cannot be resolved by the operator environment. | Do not submit a key; ask operator to fix the reference. |
| `model_disallowed` | Requested model is outside policy. | Update allowed models intentionally or choose another model. |
| `max_unit_price_exceeded` | Ask is above buyer ceiling. | Wait, lower units, or intentionally raise the ceiling. |
| `min_discount_not_met` | Marketplace price does not meet discount policy. | Wait for better depth or relax the discount rule. |
| `provider_unhealthy` | Heartbeat, catalog, or capacity is stale. | Back off and route elsewhere. |
| `no_valid_inference_route` | All candidates were rejected. | Defer, fallback if allowed, or request human approval. |

## Disabled payment challenge preview

`POST /inference/market/payment-challenge/preview` creates a disabled audit preview for a future reviewed release. It must remain disabled and must not ask for signatures, sensitive signing material, unsafe account payloads, or broadcast instructions.

```mermaid
flowchart TD
    Route[Selected dry-run route] --> Preview[Payment challenge preview]
    Preview --> Disabled[execution_enabled=false]
    Disabled --> Audit[Audit envelope]
    Audit --> UI[UI shows blocked state]
    Disabled --> NoSign[private_key_required=false]
    Disabled --> NoMove[funds_moved=false]
```

Request:

```json
{
  "order_id": "ord_demo_001",
  "reservation_id": "res_demo_001",
  "buyer_id": "agent-research-1",
  "provider_id": "provider_alpha",
  "estimated_total_cost": 0.1875,
  "currency": "USD_CREDIT",
  "acknowledge_disabled_execution": true
}
```

Response shape:

```json
{
  "ok": true,
  "payment_challenge_preview": {
    "challenge_id": "payprev_demo_001",
    "status": "disabled_preview_only",
    "execution_enabled": false,
    "requires_signature": false,
    "requires_sensitive_signing_material": false,
    "broadcast_allowed": false,
    "next_safe_actions": ["record audit preview", "continue with dry-run accounting only"]
  },
  "dry_run_only": true,
  "funds_moved": false,
  "broadcast_allowed": false,
  "private_key_required": false,
  "live_trading_enabled": false,
  "legal_review_required": true,
  "compliance_review_required": true
}
```

## Analytics and export

Analytics endpoints are designed for dashboards and agents. They expose usage and economics as records, not proof of payment.

| Endpoint | Purpose | Notes |
|---|---|---|
| `GET /inference/analytics/usage` | Query usage records by agent, provider, model, workspace, route, task, time range, and export format. | Returns summary and records. |
| `GET /inference/analytics/export` | Export JSON or NDJSON for agents, auditors, and notebooks. | Includes schema version, filters, safety fields, and redaction metadata. |

Usage records should include `workspace_id`, `agent_id`, `goal_id`, `task_id`, `model`, `provider_id`, `route_id`, `order_id`, `reservation_id`, `unit_type`, estimated and actual units, estimated and actual cost, discount basis points, selected decision, latency, quality, and the standard safety fields.

Savings rule:

```text
If reference_unit_price is known:
  reference_cost = actual_units * reference_unit_price
  savings = max(0, reference_cost - actual_cost)
  savings_bps = discount_bps
If reference_unit_price is unknown:
  export actual_cost and discount_bps only; do not invent savings.
```

Export request examples:

```http
GET /inference/analytics/usage?agent_id=agent-research-1&model=flow-memory-orchestrator
GET /inference/analytics/export?format=ndjson&workspace_id=workspace-default
```

## SDK drop-in examples

See `docs/INFERENCE_PROXY.md` and `examples/inference_market_demo.py` for SDK-compatible drop-in patterns. SDK examples use the Flow Memory base URL and Flow Memory auth. Provider credentials are never passed through SDK configuration.

Chat, responses, and embeddings base URL:

```text
http://127.0.0.1:8765/v1
```

## Shared route contract templates

These are the literal public route templates used by the dashboard/API contract. Placeholder names are part of the contract and must remain visible to agents and maintainers.

| Key | Method | Literal route template | Placeholders |
|---|---|---|---|
| `marketList` | `GET` | `/inference/market/list` | none |
| `modelDetail` | `GET` | `/inference/market/models/{model}` | `model` |
| `feed` | `GET` | `/inference/market/feed` | none |
| `buyerProfile` | `GET` | `/inference/buyers/{buyer_id}/profile` | `buyer_id` |
| `sellerProfile` | `GET` | `/inference/sellers/{seller_id}/profile` | `seller_id` |
| `providerAgentManifest` | `GET` | `/inference/providers/agent/manifest` | none |
| `credentialRefs` | `POST`, `GET` | `/inference/credential-refs` | none |
| `credentialRefRotate` | `PATCH` | `/inference/credential-refs/{credential_ref_id}/rotate` | `credential_ref_id` |
| `credentialRefRevoke` | `POST` | `/inference/credential-refs/{credential_ref_id}/revoke` | `credential_ref_id` |
| `providerEnroll` | `POST` | `/inference/providers/enroll` | none |
| `providerPairingToken` | `POST` | `/inference/providers/{provider_id}/pairing-token` | `provider_id` |
| `providerHeartbeat` | `POST` | `/inference/providers/{provider_id}/heartbeat` | `provider_id` |
| `providerCatalog` | `POST` | `/inference/providers/{provider_id}/catalog` | `provider_id` |
| `providerCapacity` | `POST` | `/inference/providers/{provider_id}/capacity` | `provider_id` |
| `providerHealth` | `GET` | `/inference/providers/{provider_id}/health` | `provider_id` |
| `marketOrders` | `POST` | `/inference/market/orders` | none |
| `marketOrderCancel` | `POST` | `/inference/market/orders/{order_id}/cancel` | `order_id` |
| `marketDepth` | `GET` | `/inference/market/depth` | none |
| `marketReservations` | `POST` | `/inference/market/reservations` | none |
| `paymentChallengePreview` | `POST` | `/inference/market/payment-challenge/preview` | none |
| `analyticsUsage` | `GET` | `/inference/analytics/usage` | none |
| `analyticsExport` | `GET` | `/inference/analytics/export` | none |

## Endpoint contract summary

The `Path` column uses the exact literal route templates from the shared dashboard/API contract. Placeholder names such as `{credential_ref_id}`, `{provider_id}`, and `{order_id}` are part of the public contract; runtime clients encode actual values before dispatch.

| Method | Path | Scope | Purpose |
|---|---|---|---|
| `GET` | `/inference/market/list` | `inference:read` | List available market models and summary depth. |
| `GET` | `/inference/market/models/{model}` | `inference:read` | Return one model detail record. |
| `GET` | `/inference/market/feed` | `inference:read` | Return recent dry-run marketplace feed records. |
| `GET` | `/inference/buyers/{buyer_id}/profile` | `inference:read` | Return buyer marketplace profile metadata. |
| `GET` | `/inference/sellers/{seller_id}/profile` | `inference:read` | Return seller marketplace profile metadata. |
| `GET` | `/inference/providers/agent/manifest` | `inference:read` | Return provider-agent public onboarding manifest metadata. |
| `POST` | `/inference/credential-refs` | `inference:sell` or `inference:admin` | Create a credential reference record. |
| `GET` | `/inference/credential-refs` | `inference:read` | List visible credential reference records without secrets. |
| `PATCH` | `/inference/credential-refs/{credential_ref_id}/rotate` | `inference:admin` | Rotate a reference pointer and invalidate stale routes. |
| `POST` | `/inference/credential-refs/{credential_ref_id}/revoke` | `inference:admin` | Revoke a reference and gate provider routes. |
| `POST` | `/inference/providers/enroll` | `inference:sell` | Enroll a provider-agent runtime. |
| `POST` | `/inference/providers/{provider_id}/pairing-token` | `inference:sell` | Create a short-lived runtime pairing token record. |
| `POST` | `/inference/providers/{provider_id}/heartbeat` | `inference:sell` | Record runtime liveness. |
| `POST` | `/inference/providers/{provider_id}/catalog` | `inference:sell` | Publish model/API capability catalog. |
| `POST` | `/inference/providers/{provider_id}/capacity` | `inference:sell` | Publish available units and ask prices. |
| `GET` | `/inference/providers/{provider_id}/health` | `inference:read` | Return passive health and route-gating state. |
| `POST` | `/inference/market/orders` | `inference:buy` or `inference:sell` | Create a dry-run bid or ask order. |
| `POST` | `/inference/market/orders/{order_id}/cancel` | `inference:buy` or `inference:sell` | Cancel an active dry-run order. |
| `GET` | `/inference/market/depth` | `inference:read` | Return bid/ask depth, spread, and recent dry-run records. |
| `POST` | `/inference/market/reservations` | `inference:buy` | Hold capacity without funds movement. |
| `POST` | `/inference/quote` | `inference:plan` | Return candidate quotes and rejected routes. |
| `POST` | `/inference/route` | `inference:plan` | Select a route under policy. |
| `POST` | `/inference/market/payment-challenge/preview` | `inference:buy` | Return disabled payment challenge preview. |
| `GET` | `/inference/analytics/usage` | `inference:read` | Query usage analytics. |
| `GET` | `/inference/analytics/export` | `inference:audit` | Export JSON or NDJSON usage data. |
| `GET` | `/v1/models` | `inference:proxy` | SDK-compatible model list. |
| `POST` | `/v1/chat/completions` | `inference:proxy` | SDK-compatible chat completions proxy. |
| `POST` | `/v1/responses` | `inference:proxy` | SDK-compatible responses proxy. |
| `POST` | `/v1/embeddings` | `inference:proxy` | SDK-compatible embeddings proxy. |

Machine-readable endpoint metadata lives in `docs/INFERENCE_MARKET_ENDPOINTS.json`. The raw AI-consumable index lives in `docs/INFERENCE_MARKET_INDEX.md`.

## Operator checklist

- [ ] Public UI, docs, examples, and API errors use Flow Memory naming only.
- [ ] Every provider credential is represented by `credential_ref` or `credential_ref_id` only.
- [ ] Credential refs can be created, listed, rotated, and revoked without secret values.
- [ ] Provider agents enroll, pair, heartbeat, publish catalog, publish capacity, and expose passive health.
- [ ] Bid/ask depth, order creation/cancel, and reservations are dry-run records.
- [ ] Quote and route policy enforce price, discount, model, health, fallback, and safety gates.
- [ ] Payment challenge preview is disabled and never requests sensitive signing material.
- [ ] Analytics and exports contain usage/savings records with redaction metadata.
- [ ] Production value transfer, funds movement, broadcast, sensitive signing, and live execution remain disabled or outside the product boundary.
