# Flow Memory Inference Proxy

Flow Memory Inference Proxy exposes OpenAI-compatible and Anthropic-compatible SDK entrypoints backed by the Flow Memory Inference Market. SDK clients keep familiar method calls while Flow Memory records route policy, quote selection, usage, warnings, and safety metadata.

The proxy is not a secret tunnel. Do not pass provider API keys, private signing material, payment instructions, live settlement flags, broadcast flags, or unsafe account payloads through SDK headers, JSON bodies, model names, tool parameters, or metadata. External provider access is represented only by `credential_ref` provider records created through the marketplace lifecycle.

## Safety contract

Proxy and marketplace responses must preserve the dry-run safety boundary:

| Field | Required value |
|---|---:|
| `dry_run_only` | `true` |
| `funds_moved` | `false` |
| `broadcast_allowed` | `false` |
| `private_key_required` | `false` |
| `live_trading_enabled` | `false` |
| `legal_review_required` | `true` |
| `compliance_review_required` | `true` |

Settlement, funds movement, broadcast, private signing, and live execution remain disabled or gated outside this interface.

## Request flow

```mermaid
flowchart TD
    SDK[OpenAI or Anthropic SDK] --> BaseURL[Flow Memory base URL]
    BaseURL --> Auth[Flow Memory API key or JWT]
    Auth --> Guard[Unsafe payload guard]
    Guard --> Route[Marketplace route policy]
    Route --> Depth[Bid/ask depth and provider health]
    Depth --> Quote[Selected dry-run quote]
    Quote --> Provider[Deterministic or gated credential-ref provider path]
    Provider --> Usage[Usage record]
    Usage --> Audit[Audit metadata]
    Audit --> Response[SDK-compatible response with flow_memory]
```

## SDK endpoints

| Endpoint | Scope | Compatibility |
|---|---|---|
| `GET /v1/models` | `inference:proxy` | OpenAI-compatible model list. |
| `POST /v1/chat/completions` | `inference:proxy` | OpenAI-compatible Chat Completions. |
| `POST /v1/responses` | `inference:proxy` | OpenAI-compatible Responses. |
| `POST /v1/embeddings` | `inference:proxy` | OpenAI-compatible Embeddings. |
| `GET /anthropic/v1/models` | `inference:proxy` | Anthropic-compatible model list. |
| `POST /anthropic/v1/messages` | `inference:proxy` | Anthropic-compatible Messages. |

Preflight and analytics endpoints used with SDK traffic:

| Endpoint | Scope | Purpose |
|---|---|---|
| `GET /inference/market/depth` | `inference:read` | Bid/ask depth and spread before SDK calls. |
| `POST /inference/quote` | `inference:plan` | Candidate route quotes and rejected routes. |
| `POST /inference/route` | `inference:plan` | Selected route under policy. |
| `POST /inference/market/reservations` | `inference:buy` | Expiring dry-run capacity hold. |
| `POST /inference/market/payment-challenge/preview` | `inference:buy` | Disabled payment challenge preview. |
| `GET /inference/analytics/usage` | `inference:read` | Usage and savings analytics. |
| `GET /inference/analytics/export` | `inference:audit` | JSON or NDJSON export. |

Provider lifecycle endpoints used by operators and provider agents:

| Endpoint | Purpose |
|---|---|
| `POST /inference/credential-refs` | Create a credential reference record without a secret value. |
| `GET /inference/credential-refs` | List visible credential reference records. |
| `PATCH /inference/credential-refs/{credential_ref_id}/rotate` | Rotate a reference pointer. |
| `POST /inference/credential-refs/{credential_ref_id}/revoke` | Revoke a reference and gate dependent routes. |
| `POST /inference/providers/enroll` | Enroll provider-agent runtime. |
| `POST /inference/providers/{provider_id}/pairing-token` | Create short-lived pairing token metadata. |
| `POST /inference/providers/{provider_id}/heartbeat` | Record liveness. |
| `POST /inference/providers/{provider_id}/catalog` | Publish supported models and compatible APIs. |
| `POST /inference/providers/{provider_id}/capacity` | Publish available units and ask prices. |
| `GET /inference/providers/{provider_id}/health` | Inspect passive health and route gating. |

## OpenAI SDK drop-in

Set the SDK base URL to the Flow Memory OpenAI-compatible prefix and use Flow Memory auth. The SDK key is a Flow Memory API key or local development token, not a provider key.

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8765/v1",
    api_key="flow-memory-dev-token",
    default_headers={
        "x-flow-memory-scopes": "inference:proxy,inference:plan",
        "x-flow-memory-agent-id": "agent-research-1",
    },
)

response = client.chat.completions.create(
    model="flow-memory-orchestrator",
    messages=[{"role": "user", "content": "Summarize the market depth."}],
    extra_body={
        "flow_memory": {
            "market_policy": {
                "max_unit_price": 0.0000008,
                "min_discount_bps": 500,
                "allow_fallback": True,
                "dry_run_required": True,
                "raw_credentials_allowed": False,
                "live_settlement_enabled": False,
                "broadcast_enabled": False,
                "private_key_inputs_allowed": False,
            }
        }
    },
)
print(response.choices[0].message.content)
```

Equivalent HTTP request:

```bash
curl -s http://127.0.0.1:8765/v1/chat/completions \
  -H "content-type: application/json" \
  -H "x-flow-memory-api-key: flow-memory-dev-token" \
  -H "x-flow-memory-scopes: inference:proxy,inference:plan" \
  -d '{"model":"flow-memory-orchestrator","messages":[{"role":"user","content":"hello"}],"flow_memory":{"market_policy":{"dry_run_required":true,"raw_credentials_allowed":false,"live_settlement_enabled":false,"broadcast_enabled":false,"private_key_inputs_allowed":false}}}'
```

Expected compatible response metadata:

```json
{
  "id": "chatcmpl_flow_memory_demo",
  "object": "chat.completion",
  "model": "flow-memory-orchestrator",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Flow Memory dry-run response."},
      "finish_reason": "stop"
    }
  ],
  "usage": {"prompt_tokens": 8, "completion_tokens": 5, "total_tokens": 13},
  "flow_memory": {
    "route_decision": {"selected_quote": {"route": {"compatible_api": "openai"}}},
    "usage_record": {"dry_run_only": true, "funds_moved": false},
    "warnings": [],
    "dry_run_only": true,
    "funds_moved": false,
    "broadcast_allowed": false,
    "private_key_required": false,
    "live_trading_enabled": false,
    "legal_review_required": true,
    "compliance_review_required": true
  }
}
```

## OpenAI Responses and Embeddings

```python
responses_result = client.responses.create(
    model="flow-memory-orchestrator",
    input="Explain why the route was selected in one paragraph.",
)

embedding_result = client.embeddings.create(
    model="flow-local-embedding",
    input=["market depth", "provider health"],
)
```

Embeddings may use deterministic local vectors and still produce `flow_memory` usage metadata. They must not call an external provider unless the selected route is backed by a configured credential reference and the operator has enabled that provider path.

## Anthropic SDK drop-in

Set the Anthropic SDK base URL to the Flow Memory Anthropic-compatible prefix and use Flow Memory auth headers. Do not put provider keys in the SDK key or extra headers.

```python
from anthropic import Anthropic

client = Anthropic(
    base_url="http://127.0.0.1:8765/anthropic/v1",
    api_key="flow-memory-dev-token",
    default_headers={
        "x-flow-memory-scopes": "inference:proxy,inference:plan",
        "x-flow-memory-agent-id": "agent-research-1",
    },
)

message = client.messages.create(
    model="flow-memory-anthropic-chat",
    max_tokens=256,
    messages=[{"role": "user", "content": "Which provider route is safest?"}],
    extra_body={
        "flow_memory": {
            "market_policy": {
                "allowed_models": ["flow-memory-anthropic-chat"],
                "max_unit_price": 0.001,
                "allow_fallback": True,
                "dry_run_required": True,
                "raw_credentials_allowed": False,
                "live_settlement_enabled": False,
                "broadcast_enabled": False,
                "private_key_inputs_allowed": False,
            }
        }
    },
)
print(message.content[0].text)
```

Equivalent HTTP request:

```bash
curl -s http://127.0.0.1:8765/anthropic/v1/messages \
  -H "content-type: application/json" \
  -H "x-flow-memory-api-key: flow-memory-dev-token" \
  -H "x-flow-memory-scopes: inference:proxy,inference:plan" \
  -d '{"model":"flow-memory-anthropic-chat","max_tokens":256,"messages":[{"role":"user","content":"hello"}],"flow_memory":{"market_policy":{"dry_run_required":true,"raw_credentials_allowed":false,"live_settlement_enabled":false,"broadcast_enabled":false,"private_key_inputs_allowed":false}}}'
```

```mermaid
sequenceDiagram
    participant Client as Anthropic SDK
    participant Proxy as Flow Memory proxy
    participant Market as Inference market
    participant Provider as Selected provider path
    Client->>Proxy: POST /anthropic/v1/messages
    Proxy->>Market: route request with Anthropic-compatible model
    Market-->>Proxy: selected dry-run route or rejection
    Proxy->>Provider: deterministic or gated credential-ref call
    Provider-->>Proxy: response body
    Proxy->>Market: usage and audit records
    Proxy-->>Client: Anthropic-compatible message plus flow_memory metadata
```

## Preflight before SDK calls

For expensive or autonomous work, preflight route and depth before using a drop-in SDK call.

```json
{
  "agent_id": "agent-research-1",
  "workspace_id": "workspace-default",
  "task_id": "task-42",
  "model": "flow-memory-orchestrator",
  "unit_type": "token",
  "estimated_units": 250000,
  "market_policy": {
    "allowed_models": ["flow-memory-orchestrator"],
    "max_unit_price": 0.0000008,
    "min_discount_bps": 500,
    "allow_fallback": true,
    "require_healthy_provider": true,
    "dry_run_required": true,
    "raw_credentials_allowed": false,
    "live_settlement_enabled": false,
    "broadcast_enabled": false,
    "private_key_inputs_allowed": false
  }
}
```

Route metadata to inspect:

| Field | Meaning |
|---|---|
| `flow_memory.route_decision.selected_quote.route.compatible_api` | `openai` or `anthropic` compatibility. |
| `flow_memory.route_decision.selected_quote.estimated_total_cost` | Simulated cost. |
| `flow_memory.route_decision.selected_quote.discount_bps` | Discount from reference price when available. |
| `flow_memory.route_decision.rejected_routes` | Candidates rejected by policy, price, health, or credential-ref status. |
| `flow_memory.usage_record` | Persisted usage accounting record. |
| `flow_memory.warnings` | Compatibility warnings such as streaming or unsupported tool execution. |

## Parameter compatibility

| Parameter | Behavior |
|---|---|
| `model` | Used for route selection and SDK response shape. |
| `messages` | Accepted by chat/messages surfaces and recorded in deterministic usage metadata. |
| `input` | Accepted by Responses and Embeddings surfaces. |
| `stream` | Accepted only as client-shape metadata unless a provider adapter explicitly supports streaming; otherwise return a warning. |
| `tools`, `tool_choice`, function specs | Compatibility metadata by default; deterministic local providers do not execute tools. |
| `temperature`, `top_p`, `max_tokens` | Accepted as SDK fields; deterministic local responses may ignore randomness. |
| Provider credentials, signing material, payment instructions | Rejected. Use credential-ref lifecycle endpoints instead. |

## Backoff and health

```mermaid
flowchart TD
    Error[Route warning or rejection] --> Classify{Reason}
    Classify -->|credential_ref_unresolved| Operator[Operator repairs reference]
    Classify -->|provider_unhealthy| Backoff[Agent backs off and routes elsewhere]
    Classify -->|max_unit_price_exceeded| Price[Buyer waits or changes price ceiling]
    Classify -->|min_discount_not_met| Discount[Buyer waits or changes discount rule]
    Classify -->|no_valid_inference_route| Defer[Defer or require human approval]
    Operator --> Health["GET /inference/providers/{provider_id}/health"]
    Backoff --> Health
```

Agents should not retry by weakening safety flags. Safe retries change task size, allowed model, price ceiling, discount threshold, fallback policy, or wait time while preserving `dry_run_required=true`, `raw_credentials_allowed=false`, `broadcast_enabled=false`, and `private_key_inputs_allowed=false`.

## Usage exports after SDK calls

```bash
curl -s "http://127.0.0.1:8765/inference/analytics/usage?agent_id=agent-research-1" \
  -H "x-flow-memory-api-key: flow-memory-dev-token" \
  -H "x-flow-memory-scopes: inference:read"

curl -s "http://127.0.0.1:8765/inference/analytics/export?format=ndjson&agent_id=agent-research-1" \
  -H "x-flow-memory-api-key: flow-memory-dev-token" \
  -H "x-flow-memory-scopes: inference:audit"
```

Exports are accounting and audit records only. They are not settlement records and do not prove funds movement.

## Example files

- `examples/inference_market_demo.py` shows OpenAI-compatible, Anthropic-compatible, depth, route, analytics, and disabled payment-preview request shapes without making network calls by default.
- `examples/inference_provider_onboarding_demo.py` shows credential-ref creation, provider enrollment, pairing, heartbeat, catalog, capacity, and health request shapes without raw credentials.
- `examples/inference_agent_quote_route_buy.py` shows an agent quote, route, order, reservation, disabled payment preview, and export flow without external calls.

## Operator checklist

- [ ] SDK clients point to Flow Memory base URLs, not provider URLs.
- [ ] SDK auth is Flow Memory auth, not provider auth.
- [ ] External provider access uses `credential_ref` provider records only.
- [ ] Proxy responses include route decision, usage record, warnings, and safety metadata.
- [ ] Quote/route preflight is used for autonomous or expensive work.
- [ ] Usage/export endpoints remain dry-run analytics, not settlement evidence.