Skip to content

REST API

Arbiter exposes a JSON-over-HTTP API under /api/v1. The API surface is split into two categories that authenticate differently.

Authentication

Programmatic (Bearer-only)

These endpoints are intended for scripts, integrations, and downstream consumers. They accept only a personal API key sent as a Bearer token — session cookies are explicitly stripped by the security filter chain so a logged-in admin's browser session cannot be CSRF'd into reaching them. The Bearer-only set is small and stable:

Method Path
POST /api/v1/ingest
GET /api/v1/search
POST /api/v1/documents/{id}/finalize
GET /api/v1/documents/{id}/audit

To use them, generate an API key from Personal settings and send it on every request:

Authorization: Bearer <your-api-key>

Arbiter stores only the SHA-512 hash of the key. The plaintext value is shown once at generation and cannot be recovered. Rotate by generating a new key (which replaces the old one) or revoke the existing one. Failed authentication (no header, malformed header, or unknown key) returns 401 Unauthorized.

Session-allowed (browser-UI shared)

The remaining /api/v1/** endpoints are used by Arbiter's own web UI and accept either a Bearer token or the browser's session cookie. Cross-origin abuse is blocked by the browser's same-origin policy plus the absence of a permissive CORS configuration; Bearer authentication still works for programmatic clients that prefer to use them. Endpoints in this category are called out below per-section. The API key carries the same role and group permissions as the user that owns it; session callers carry whatever role and groups their account has.

Document ingestion

POST /api/v1/ingest

Bearer-only. Submit a plain-text document. Ingestion is asynchronous: the document is persisted in PENDING and placed on the redaction queue. A background worker runs Philter in arrival order; once redaction completes, the document moves to REVIEW_REQUIRED (PII detected with low-confidence spans), AUDIT_REQUIRED (eligible for auto-approval but sampled for review per the batch's audit sampling rate), or AUTO_APPROVED.

Request body:

{
  "batchId":  "string",
  "name":     "string",
  "text":     "string",
  "priority": 2
}

priority is optional. It accepts an integer in 1..3 (1 Low, 2 Normal, 3 High); omitting it or sending null defaults to Normal. The value is stored on the document and surfaced as a chevron icon on the Document Queue. It does not affect ingest ordering — redaction still runs oldest-first.

Status Meaning
202 Accepted; body {"taskId": "..."}. Redaction runs asynchronously.
400 batchId does not exist (or required fields missing/invalid, including priority outside 1..3).
403 Caller does not have access to that batch.
409 Batch is closed; body includes "closed": true.

The returned taskId is the document's id. Poll GET /api/v1/documents/{id}/spans or GET /api/v1/queue to track its progress out of PENDING.

A SHA-512 hash of the submitted text (UTF-8 bytes) is recorded on the document at ingest time — see Security · Document content integrity.

Triage

GET /api/v1/queue

Session-allowed. List documents the caller can see, paged by sort field.

Query param Default Meaning
page 0 Zero-indexed page (negative values are clamped to 0)
size 10 Page size, clamped to the range [1, 100]
batchId Filter to one batch
status Filter to one status
filename Substring match on filename, case-insensitive
myGroupsOnly false Admin opt-in: restrict admins to their own groups
sort riskScore One of riskScore, status, batchId, filename, priority
dir desc asc or desc

size is hard-capped at 100 — values above that are silently lowered, and values below 1 are raised to 1. Page through larger result sets with successive page values rather than a larger size.

Non-admins are always restricted to their groups; the myGroupsOnly parameter only affects admin callers.

Response is a Spring Page<Map> shape:

{
  "content": [
    {
      "id": "string",
      "filename": "string",
      "status": "PENDING|REVIEW_REQUIRED|AUDIT_REQUIRED|AUTO_APPROVED|APPROVED|REJECTED|FAILED",
      "riskScore": 0.0,
      "batchId": "string",
      "batchName": "string",
      "autoApproved": false,
      "documentThreshold": 0.25,
      "priority": 2
    }
  ],
  "totalElements": 0,
  "totalPages": 0,
  "number": 0,
  "size": 10
}

autoApproved is the derived display flag: it's true when the document's risk score is at or below documentThreshold and the document is neither in a user-decided terminal state (APPROVED, REJECTED, FAILED) nor in AUDIT_REQUIRED. The stored status field is independent.

GET /api/v1/batches

Session-allowed. List batches the caller can target. Honors the same myGroupsOnly query param. Returns a JSON array of {id, name}.

Documents

GET /api/v1/documents/{id}/spans

Session-allowed. Return every Span row in the document. Useful for building a custom review client or for reconciling the redactor's output with downstream systems.

404 if the document doesn't exist or the caller lacks group access.

POST /api/v1/documents/{documentId}/spans

Session-allowed. Manually create a span at an explicit character range. Used by the review UI when a reviewer highlights uncovered PII; the API is also available to clients.

{ "type": "ssn", "start": 42, "end": 53 }

type is validated against the PII types list. The new span is persisted with confidence: 1.0, status: APPROVED, and manuallyCreated: true.

Status Meaning
200 Returns the saved Span JSON.
400 Missing/invalid type, start, or end; range exceeds the text.
404 Document not found or caller lacks access.
409 Document is in a terminal state and cannot be edited.

POST /api/v1/documents/{id}/finalize

Bearer-only. Produce the redacted text for a document by sending its approved spans to Philter and applying them. The response is the post-redaction string. On success, the document is transitioned to FINALIZED and the rendered redacted text is persisted on the document so a later download still works even if a finalization policy clears the source text.

{ "finalizedText": "string" }
Status Meaning
200 Returns { "finalizedText": "..." } and the document is now FINALIZED.
404 Document not found, or the caller lacks group access.
409 Document is not in APPROVED, or its source text is unavailable (e.g. cleared by a finalization policy on a prior pass) and cannot be re-finalized.

GET /api/v1/documents/{id}/audit

Bearer-only. Return a redaction audit trail — every span on the document with its text, type, confidence, and current status. Useful for after-the-fact review or compliance reporting.

[
  { "text": "...", "type": "ssn", "confidence": 0.92, "status": "APPROVED" }
]

404 if the document doesn't exist or the caller lacks group access.

GET /api/v1/documents/{id}/history

Session-allowed. Return the full audit history for a document — document-level events and all span events — as a JSON array sorted newest first. Powers the Audit Log popup on the Document Queue.

Restricted to ROLE_ADMIN or ROLE_AUDITOR. Returns 403 for any other caller because the history includes raw PII span text.

Each element:

{
  "timestamp": "2026-05-01T12:00:00Z",
  "actor":     "<mongodb-user-id>",
  "action":    "SPAN_UPDATE",
  "resourceType": "Span",
  "resourceId":   "...",
  "details":   {}
}

The actor field defaults to the MongoDB user ID. Pass ?resolveActors=true to receive the user's email address instead — this parameter requires ROLE_ADMIN or ROLE_AUDITOR and returns 403 otherwise.

GET /api/v1/documents/{id}/history.csv

Session-allowed. Download the document's full audit history (document-level events plus all events on its spans) as a CSV, sorted newest first. Powers the Download button on the Document Queue's Audit Log popup. See Audit log for the column list.

Restricted to ROLE_ADMIN or ROLE_AUDITOR. Returns 403 for any other caller. The CSV deliberately omits PII text — span entries include spanCharacterStart, spanCharacterEnd, and spanPage instead. The actor column contains the actor's email address (the CSV is admin/auditor-only, so email exposure is appropriate).

GET /api/v1/documents/{id}/certificate

Session-allowed. Return the redaction certificate for a finalized document — a JSON object summarising the document hash, finalize timestamp, and span counts. Powers the Certificate popup on the Document Queue. The caller must have group access to the document; 404 is returned if the document doesn't exist or the caller lacks access.

GET /api/v1/documents/{id}/comments

Session-allowed. Return reviewer comments left on the document, oldest first.

[
  { "id": "...", "userEmail": "user@example.com",
    "timestamp": "2026-05-04T13:00:00Z", "text": "..." }
]

POST /api/v1/documents/{id}/comments

Session-allowed. Add a comment to the document. The request body is a JSON object with a single text string (max 4 000 characters; surrounding whitespace is trimmed). Returns the saved comment in the same shape the GET above produces.

{ "text": "..." }

Spans

PATCH /api/v1/spans/{id}

Session-allowed. Update a span's status, type, or both.

{
  "status":        "APPROVED|REJECTED|PENDING|NEEDS_SECOND_OPINION",
  "type":          "ssn",
  "reason":        "...",
  "exemptionCode": "..."
}
Field Required Notes
status optional One of the allowed statuses. Sending neither status nor type returns 400.
type optional New PII type; validated against the PII types list.
reason optional Required when overturning another reviewer's prior APPROVED decision (changing status away from APPROVED while the prior approval was recorded by a different actor). Returns 409 OVERTURN_REASON_REQUIRED otherwise. Recorded in the audit trail.
exemptionCode optional Free-form string applied only when the new status is APPROVED. Cleared automatically when the span moves out of APPROVED.

Returns the updated Span object.

409 if the parent document is in a terminal state, or if an overturn is attempted without a reason.

DELETE /api/v1/spans/{id}

Session-allowed. Hard-delete a span. Only manually-created spans can be deleted — for spans the redactor produced, flip status to REJECTED instead.

{ "id": "...", "deleted": true }

400 if the span was redactor-created. 409 if the parent document is terminal.

POST /api/v1/spans/{id}/redact-like

Session-allowed. Find every other occurrence of the source span's text in the parent document and approve each match with the source span's PII type. New Span rows are created where matches don't already have one; existing spans at exact ranges are flipped to APPROVED and aligned to the source type. Overlapping non-exact matches are skipped to avoid duplicate spans.

Requires Content-Type: application/json (the request body itself is ignored; the JSON content type is enforced as a CSRF defence so cross-site form posts can't trigger this endpoint).

Response:

{ "created": 0, "approved": 0 }

created is the number of new spans inserted. approved is the number of existing spans flipped to approved.

400 if the source span has empty text. 404 if the span or its document is missing.

POST /api/v1/spans/{id}/reset

Session-allowed. Revert a span back to its previous status. The intended use is "I clicked Approve / Reject by mistake" — the endpoint moves a span out of its terminal state and back into review. The optional JSON body {"originalStatus": "PENDING|REVIEW_REQUIRED"} selects the target status; an empty body falls back to the span's prior status as recorded in the audit log.

Requires Content-Type: application/json. Returns the updated Span JSON.

404 if the span doesn't exist or the caller lacks group access. 409 if the parent document is in a terminal state.

GET /api/v1/spans/{id}/history

Session-allowed. Return the audit history for a single span as a JSON array sorted newest first. Accessible to any authenticated user with group access to the span's parent document.

[
  {
    "timestamp":    "2026-05-01T12:00:00Z",
    "actor":        "<mongodb-user-id>",
    "action":       "SPAN_UPDATE",
    "resourceType": "Span",
    "resourceId":   "...",
    "details":      {}
  }
]

The actor field defaults to the MongoDB user ID of the actor to avoid leaking email addresses to other reviewers. Pass ?resolveActors=true to receive email addresses instead — this parameter requires ROLE_ADMIN or ROLE_AUDITOR and returns 403 otherwise.

404 if the span doesn't exist or the caller lacks group access.

GET /api/v1/search

Bearer-only. Full-text search across the OpenSearch index of ingested documents. Each document is indexed at ingest time with its filename, batch, status, and full original text.

Query param Default Meaning
q Match query (required, runs against the text)
offset 0 First hit to return
size 10 Max hits per page (capped at 100)

Response:

{
  "query": "...",
  "offset": 0,
  "size": 10,
  "total": 42,
  "hits": [
    {
      "id": "...",
      "batchId": "...",
      "filename": "...",
      "status": "AUTO_APPROVED",
      "highlights": ["… <em>match</em> snippet …"]
    }
  ]
}

Results are pre-filtered to batches the caller can access. Non-admin callers only see hits from their own group's batches; total reflects that filtered count, not the full index size. Inaccessible hits are silently excluded — they are never returned as placeholder rows.

If OpenSearch is unreachable, a query returns an empty result set rather than failing the request — search is best-effort.

LLM-as-a-Judge

These endpoints proxy a configured Ollama instance to provide an LLM second opinion on the redactor's output. Configuration lives under Admin → LLM-as-a-Judge.

PII leaves Arbiter on every call

Both explain and second-opinion send the full unredacted document text and all PII span values to Ollama. Each call is recorded in the audit log as DOCUMENT_PII_SENT_TO_LLM before the HTTP request is sent, so the entry exists even if Ollama is unreachable. Ollama must be deployed with request-body logging disabled (OLLAMA_DEBUG=0) to prevent PII from appearing in Ollama's logs.

GET /api/v1/ollama/{instanceId}/models

Session-allowed. List the models installed on a configured Ollama instance.

{ "instanceId": "...", "instanceName": "...", "models": ["llama3", "mistral"] }

404 if the instance id is unknown. 502 if Ollama is unreachable.

POST /api/v1/documents/{documentId}/explain

Session-allowed. Ask the LLM to explain the PII risk in a document.

{ "instanceId": "...", "model": "llama3" }

Response:

{ "instanceName": "...", "model": "llama3", "response": "..." }

POST /api/v1/spans/{spanId}/second-opinion

Session-allowed. Ask the configured Second Opinion default Ollama instance/model whether the named span is genuinely PII or a likely false positive. The instance and model are chosen from the LLM-as-a-Judge defaults — the request body is empty but the Content-Type: application/json header is required (CSRF defence).

{ "instanceName": "...", "model": "...",
  "sourceText": "...", "sourceType": "ssn",
  "response": "..." }

400 if no Second Opinion default is configured.

Policies

These endpoints power the Phileas redaction-policy editor under Admin → Policies. The framework gates them to ROLE_ADMIN or ROLE_AUDITOR for reads — non-admin callers get 403.

GET /api/v1/policies

Session-allowed. List the policies installed on a configured Philter instance, or on the embedded Phileas runtime when no instance is specified.

Query param Default Meaning
instanceId embedded Philter instance id, or embedded for the built-in.

Response:

{ "instanceId": "embedded", "policies": ["Default", "..."] }

502 when the named Philter instance is unreachable; 404 if instanceId doesn't match any configured instance.

GET /api/v1/policies/content

Session-allowed. Fetch the JSON content of one named policy on the chosen instance. Both query parameters are required.

Query param Required Meaning
instanceId yes Philter instance id, or embedded for the built-in.
name yes Policy name; restricted to letters, digits, hyphens, and underscores (1–64 chars).

Response:

{ "name": "Default", "content": "{ ...JSON policy... }" }

content is the raw policy JSON as a string (the embedded runtime stores policies as text; remote Philter responses are returned verbatim).

400 for malformed names (the regex check rejects path-traversal attempts). 404 when the named policy doesn't exist on the instance. 502 when the remote Philter instance is unreachable.

Browser-only internal endpoints

The following paths exist under /api/v1/** but are intended only for the in-page JavaScript that drives the review UI. They are session-callable, do nothing harmful in isolation, and are documented here purely so that traffic from arbiter's own UI doesn't look surprising in HTTP logs:

Path Purpose
POST /api/v1/review/{documentId}/pulse Sliding-expiry heartbeat sent every ~30 s by the review page so the document's pessimistic edit lock doesn't expire while the reviewer is still on the page. Operates on the caller's own lock only.
POST /api/v1/review/{documentId}/release navigator.sendBeacon'd when the reviewer leaves the page. Releases the caller's own lock.
GET /api/v1/review/{documentId}/similar Backs the "Find similar documents" button on the review page. Returns up to 10 hits filtered to the caller's accessible batches.

There's no programmatic-API reason to call these — use PATCH /spans/{id} and GET /search instead.

Errors

Error responses have a JSON body with at least an error field describing the issue. Status codes follow the table per endpoint above.