Workflow¶
This page walks through how Arbiter is used end-to-end: how an administrator sets it up, how documents flow through it, and how reviewers approve or reject the redactions Arbiter proposes. If you have not yet read Concepts, start there for the vocabulary used below.
The big picture¶
Arbiter is a human-in-the-loop deidentification reviewer. It does not detect PII on its own — that is Philter's job. Arbiter coordinates the work:
- An admin configures a Philter instance Arbiter can call (or chooses the built-in Embedded Philter).
- An admin authors policies. A policy tells Philter what counts as PII and how to redact each type.
- An admin creates a batch for a group of related documents and points it at a Philter instance and a policy.
- A reviewer uploads documents to the batch. Each document is sent to the batch's Philter instance using its policy. Detected PII is stored as spans.
- A reviewer works the queue. Documents that need attention surface at the top, sorted by risk score. Each document is opened in the side-by-side review UI.
- The reviewer accepts or rejects each span, then approves or rejects the document as a whole. Decisions are captured in the audit log.
Everything in Arbiter is built around that flow.
1. Configure Philter¶
A Philter instance is the redaction engine Arbiter calls to find PII. Each batch is associated with exactly one Philter instance.
There are two kinds:
- Embedded Philter — Arbiter ships with a built-in redaction engine (Phileas) that runs as part of the Arbiter process itself. There is nothing to install or configure: it is always available, always reachable, and uses the policies stored in Arbiter's database. It is the right choice for development, demos, and small single-node deployments.
- External Philter — a separately running Philter service that Arbiter reaches over HTTP. Use this for production deployments, for shared instances across multiple Arbiters, when policies are managed centrally on the Philter server, or when the redactor needs to scale independently of the Arbiter web tier.
You can register any number of external Philter instances. A batch picks one of them (or Embedded Philter) at create time, and that choice can be changed later.
Registering an external Philter (admin)¶
- Sign in as an administrator.
- Go to Admin → Philter.
- Use Add Philter instance and fill in:
- Name — a unique label (e.g.
philter-prod). - Endpoint — host or IP (e.g.
philteror192.168.1.20). - Port — the Philter port (default
8080).
- Name — a unique label (e.g.
- Click Add. The instance is saved to MongoDB.
- Click Test on the row to confirm Arbiter can reach it. The test hits
the instance's
/api/statusendpoint and reports the HTTP response. - Optionally set the row as the default so newly created batches start with it pre-selected. The Embedded Philter is the default if no other default is set.
Removing an instance is allowed even if batches reference it; affected batches will fail to redact until they are reassigned to another instance or to Embedded Philter.
Default-instance pre-selection¶
The "Default Philter instance" setting on Admin → Philter controls which option is pre-selected in the Create Batch dialog's Philter dropdown. It does not force all batches to use that instance — each batch makes its own choice.
2. Create PII redaction policies¶
A policy is a JSON document Philter uses to identify and redact PII. It
declares which PII types to look for and what to do with each
(REDACT, MASK, REPLACE, etc.). For the policy schema and examples, see
the Philter documentation.
Where the policy lives depends on the Philter instance:
- Embedded Philter — policies live in Arbiter's MongoDB
policiescollection. Manage them under Policies in the left-side menu. - External Philter — policies live on the Philter server. Arbiter can
list them (read-only) by calling the Philter instance's
/api/policiesendpoint, but creation, edit, and delete must happen on the Philter server.
The Policies page presents both modes with the same UI:
- Click Policies in the left-side menu.
- Choose a Philter instance from the dropdown — Embedded Philter is the default option.
- The table lists the policies available on that instance.
- For Embedded:
- Use Add policy to register a new one. The JSON is validated before saving. Names are unique (case-insensitive).
- View opens any policy's JSON in a popup.
- Edit opens the JSON in an editable popup. The policy name is shown as read-only — once a policy exists, its name cannot be renamed via the UI or the API.
- Remove deletes the policy. A policy that is currently referenced by one or more batches cannot be deleted; the error names the offending batches.
- For external Philter instances the table is read-only (View only).
Arbiter's demo data seeds a policy named Default in the Embedded Philter collection so the system is usable out of the box.
3. Create a batch¶
A batch is the unit of organisation in Arbiter. Every document belongs to exactly one batch. A batch carries:
| Field | Purpose |
|---|---|
| Name | Human-readable label (unique). |
| Group | Which user group can see and act on this batch (visibility scope). |
| Philter instance | Which Philter instance redacts documents in this batch. |
| Policy | Which policy on that Philter instance to apply. |
| PII Threshold | Per-span confidence floor for auto-accepting detections. |
| Document Threshold | Risk-score ceiling for auto-approving whole documents. |
| PII type weights | Per-type sensitivity used in the risk score. |
| Closed flag | A closed batch refuses new documents. |
Only administrators can create, modify, or close a batch. To create one:
- Sign in as an administrator and open Batches.
- Click Create Batch to open the create dialog and fill it out:
- Name and Group are required.
- Philter defaults to Embedded Philter (or to the system default if one is configured).
- Policy populates dynamically based on the chosen Philter instance. Selecting a different Philter reloads the policy list.
- PII Threshold (default
0.80) — spans whose confidence is at or above this value are auto-accepted. Lower values give the redactor more autonomy; higher values send more spans to the reviewer. - Document Threshold (default
0.25) — documents whose risk score is at or below this value are auto-approved without human review.
- Click Create in the dialog footer.
After creation, anyone with access to the batch can adjust the group, thresholds, and weights. The Philter instance and policy can be changed together via the Change… link in the Philter / Policy column — choosing a Philter reloads the policy list inside the modal so the two stay in sync. Only admins can close a batch.
For more on batch settings see the Batches admin guide.
4. Upload documents¶
Documents enter Arbiter through one of two paths.
Through the web UI¶
- Click Upload in the left-side menu.
- Choose a Batch from the dropdown — only open batches in groups you belong to are listed. The "Create a new batch." link below the dropdown takes you to the Batches page if you need a new one.
- Choose a file with Document. Supported formats are
.txtand searchable PDF (.pdf). Scanned PDFs are not OCR'd. - Click Redact Document.
Arbiter sends the file to the batch's Philter instance using the batch's policy, persists the document, creates a span for each detection, and computes the document's risk score. You then land on a preview page where you can download the redacted text or PDF.
Through the REST API¶
Programmatic ingestion uses POST /api/v1/ingest with a JSON body
containing the batch ID, a filename, and the text. Authentication is by
session or by API key issued from Personal settings. See the
REST API reference.
The API path is admin-or-group-member-only; it returns HTTP 409 when the batch is closed.
What happens after upload¶
Ingestion is asynchronous. The upload (or POST /api/v1/ingest) persists
the document in PENDING and returns immediately; a background worker drains
the redaction queue oldest-first. Multiple Arbiter replicas can run side by
side: each document is claimed atomically (PENDING → PROCESSING) so it is
never processed twice. Admins watch the queue at Admin → Ingest Queue.
When the worker picks up a document, for each one:
- Spans are extracted by Philter, scored by confidence.
- Each span's initial status is set from the batch's PII Threshold:
confidence ≥ threshold→APPROVED(auto-accepted).confidence < threshold→PENDING(needs review).
- The document's risk score is computed using the batch's PII weights and the count of unresolved spans.
- The document's status is set:
- Any
PENDINGspan →REVIEW_REQUIRED. - Otherwise the worker rolls the batch's Audit Sampling Rate:
- sampled in →
AUDIT_REQUIRED(sent to review for spot-check). - not sampled →
AUTO_APPROVED.
- sampled in →
- Any
- The document is indexed in OpenSearch (full original text + metadata) and appears in the queue.
A document whose risk score is at or below the batch's Document Threshold
and that isn't in AUDIT_REQUIRED is shown with the AUTO_APPROVED label
even before human review; re-tuning the threshold relabels existing rows on
the next page load.
If Philter fails for any reason the document moves to FAILED and the
exception is captured both in the Arbiter logs and in a failureMessage
field on the document — admins can view the failure detail from the
View log link in the Recent failures table on Ingest Queue.
For more detail see Adding documents.
5. Work the queue¶
The Queue at /queue is the reviewer's home page. It lists every
document the user has access to, ordered by risk score (highest first), so
the riskiest material lands at the top.
A reviewer typically:
- Optionally filters by Batch or Status —
REVIEW_REQUIREDfirst if you want to focus on documents needing attention. - Clicks Review on a row to open the side-by-side review UI.
- Works through pending spans (see the next section).
- Approves or rejects the document.
- Returns to the queue and repeats.
Admins see an extra Limit to my groups checkbox that toggles between their group-scoped view and a global view. Pagination is 10 documents per page. For a column-by-column reference see Queue.
6. Review a document¶
The review page (/review/{id}) has three panes:
- Original — the source text with each detected span highlighted.
- Redacted — the same text with each accepted span replaced by a
<<TYPE>>marker. - PII Navigator — one entry per span, sorted by position.
The Original and Redacted panes scroll together. Clicking a span anywhere focuses the matching navigator entry and vice versa.
Per-span actions¶
Each navigator entry exposes:
- A type pill that doubles as a dropdown to change the detected type
(e.g. flip a span from
ssntophone-number). - The matched text and the redactor's confidence.
- A status pill — Accepted or Refused — that toggles on click. A refused span is removed from the Redacted pane and struck through in the navigator.
- Redact All Like This — finds every other case-sensitive occurrence of the exact text in the document and updates them to match. Useful for unambiguous strings (project names, email addresses); use with care on short or substring-prone text.
Document-level actions¶
The header has Approve, Reject, and (for already-approved documents) Unapprove:
- Approve moves the document to
APPROVEDand returns to the queue. - Reject moves the document to
REJECTEDand returns to the queue. - Unapprove flips an approved document back to
REVIEW_REQUIRED.
Approve / Reject are hidden once the document is in a terminal state
(APPROVED, REJECTED, FAILED). Every action is captured in the audit
log with the actor, timestamp, and the change made.
For more detail see Reviewing a document.
A typical end-to-end run¶
Putting all the pieces together:
- Admin stands up Arbiter and MongoDB. Embedded Philter is available immediately.
- Admin opens Admin → Philter, optionally registers an external Philter for production use, and clicks Test to confirm reachability.
- Admin opens Policies, picks Embedded Philter (or the new external instance), and reviews the Default policy. They edit it or add a new one tuned to their data.
- Admin creates a user group under Admin → Groups and adds the relevant reviewers to it.
- Admin opens Batches, creates
Q3-onboarding-forms, picks the group, picks the Philter instance, picks a policy, and accepts the default thresholds. - Reviewer signs in, lands on the Dashboard, and sees the new batch under "Manage batches" and the documents-needing-review widget.
- Reviewer uses Upload to send a document into the batch — it lands
in the queue with status
REVIEW_REQUIREDand a non-zero risk score. - Reviewer opens the queue, clicks Review, accepts the spans that are correct, refuses or retypes the ones that are wrong, and clicks Approve.
- The document moves to
APPROVED. Risk score, span decisions, and the approval are now in the audit log. - When
Q3-onboarding-formsis finished, Admin clicks Close on the batch to lock it against new uploads. Existing documents in the batch stay reviewable forever.
Where to go next¶
- Concepts — the underlying vocabulary.
- Adding documents, Queue, Reviewing a document — page-by-page reference for reviewers.
- Batches, Users and groups, Audit log — admin reference.
- Risk score, PII types, REST API — reference material.