Concepts¶
Users and roles¶
A user signs in with an email address and a password. Each user has exactly one role:
USER— the default. Can view and review the batches and documents in groups they belong to.ADMIN— has full visibility (with an opt-in "Limit to my groups" filter) and exclusive access to administrative actions like creating batches, closing batches, and managing users / groups / settings.AUDITOR— read-only counterpart toADMIN. Sees the same cross-group data an admin sees (queue, search, audit log, reports, batches), but cannot mutate any state. Useful for compliance, legal, or analyst roles that need to inspect activity without being able to change it. See Roles and permissions for the full feature matrix.
Each user can also generate a personal API key for programmatic access. API keys carry the same permissions as the owning user account.
Groups¶
A group is a named collection of users. Every batch must be assigned to exactly one group, and that assignment is what scopes visibility:
- A
USERonly sees batches whose group they belong to (and only the documents inside those batches). - An
ADMINsees everything by default. The "Limit to my groups" checkbox on the queue and batches pages flips an admin to the same scoped view a regular user would see. - An
AUDITORsees everything an admin sees and is not bound to a group; the role is intentionally cross-cutting because the whole point is a global read.
A group must have at least one member. Admins manage groups under Admin → Groups.
Within a group, an admin can additionally designate one or more members as
team leads. A team lead is a regular USER who, for that one group,
gains the operational authority to create batches, close them, and change
their settings (Philter instance, domain, weights, thresholds). Team
leadership is per-group: a user can lead group A and remain a regular member
of group B. This lets day-to-day batch operations be delegated without
granting site-wide admin authority. See
Team leads for the full description.
Batches¶
A batch is a container for documents. It has:
| Field | Meaning |
|---|---|
| Name | Human-readable label |
| Group | Which user group can see and act on it |
| PII Threshold | Per-span confidence floor for auto-accepting PII detections (default 0.8) |
| Document Threshold | Risk-score ceiling for auto-approving documents (default 0.25) |
| Audit Sampling Rate | Fraction of would-be auto-approved documents pulled into review for audit (default 0.10) |
| PII type weights | Per-type sensitivity for the risk score (see PII types) |
| Philter instance | Which configured Philter instance redacts the batch's documents (else embedded Phileas) |
| Policy | Name of the Phileas policy used during redaction |
| Domain | Optional grouping tag used for reporting |
| Closed | A closed batch refuses new documents (existing ones remain reviewable) |
Admins can create or close any batch, and team leads can create or close batches in groups they lead. Most batch settings (Philter instance, domain, thresholds, weights) can be changed by an admin or by a team lead of the batch's group; reassigning a batch to a different group is admin-only.
Documents¶
A document belongs to a single batch. It carries the original text, a
filename, a status (see below), a numeric risk score between 0 and 1
that reflects how much PII is in the document weighted by sensitivity, and a
priority flag (1 Low, 2 Normal, 3 High; defaults to Normal) that the
queue surfaces as a small chevron icon next to the filename.
Document statuses:
| Status | Meaning |
|---|---|
PENDING |
Awaiting redaction in the ingest queue |
PROCESSING |
Currently claimed by a redaction worker (transient) |
REVIEW_REQUIRED |
Has at least one span the reviewer must accept or reject |
AUDIT_REQUIRED |
Eligible for auto-approval but pulled into review by audit sampling |
AUTO_APPROVED |
All spans auto-accepted, no human review needed |
APPROVED |
A reviewer explicitly approved the document |
REJECTED |
A reviewer explicitly rejected the document |
FINALIZED |
The document was finalized after approval — the redacted text has been rendered and the Certificate of Redaction issued. Terminal: a finalized document cannot be reopened or re-finalized. |
FAILED |
Redaction failed and the document was stored without spans |
SKIPPED |
Placeholder row written when an OpenSearch / Elasticsearch import detected a duplicate (sourceIndex, sourceDocId) and chose not to re-enqueue. Carries source attribution but no content. |
The queue also surfaces an AUTO_APPROVED display label for any non-terminal
document whose risk score is at or below the batch's Document Threshold,
overlaying the underlying status — retuning the threshold relabels existing
rows. Documents in AUDIT_REQUIRED are deliberately excluded from this
relabeling so the audit sample stays visible.
Ingest queue¶
Newly received documents (whether uploaded through the UI or submitted via the
REST API) are persisted as PENDING and placed on a
shared, MongoDB-backed redaction queue. A background worker drains the queue
oldest-first, claiming each document atomically (PENDING → PROCESSING) so
multiple replicas can run safely. Once Philter completes, the worker writes
the spans, computes the risk score, and transitions the document to its final
post-ingest status (REVIEW_REQUIRED, AUDIT_REQUIRED, or AUTO_APPROVED).
Admins monitor the queue at Admin → Ingest Queue, where they can also remove a still-pending document. The page also surfaces six summary widgets:
- Pending — documents currently awaiting redaction.
- Processing — documents claimed by a worker.
- Failed — documents whose redaction failed (clearable via Clear failures).
- Skipped — placeholder rows from re-runs of OpenSearch / Elasticsearch
imports that detected duplicates by
(sourceIndex, sourceDocId). - Last 24 hours — documents created in the last 24 hours, regardless of current status.
- Throughput — documents/hour averaged over the last 24 hours,
computed from
statusChangedAtfor documents that have leftPENDING/PROCESSING.
Background jobs¶
Long-running ingests (today: pulling from OpenSearch or Elasticsearch
data sources) run as background jobs rather
than blocking the request thread. Each job is a row in the background_jobs
collection with status PENDING → RUNNING → COMPLETED / FAILED. A
dispatcher polls every couple of seconds and atomically promotes one
PENDING job per batch to RUNNING, so admins can queue many imports
against a batch without worrying about race conditions:
- Per-batch serialisation. A partial unique index on
background_jobsguarantees at most oneRUNNINGdata-import job per batch, even across replicas. Subsequent jobs sit atPENDINGuntil the running one finishes. - Global ceiling. Admin → General → Max concurrent data imports
caps how many jobs run system-wide (
1–10, default1).
Reviewers watch progress on the Background Jobs page; the user who started a job receives an inbox notification when it ends.
Spans¶
A span is a single PII detection inside a document. Each span has:
- A type (e.g.,
ssn,phone-number,email-address) — see PII types. - A confidence between 0 and 1 from the redactor.
- A status:
PENDING(needs review),APPROVED(will be redacted), orREJECTED(will be left as-is). - A character
start/endand PDF coordinates if applicable.
When a document is created, each span's initial status is set automatically based on the batch's PII Threshold:
confidence ≥ PII Threshold→APPROVEDconfidence < PII Threshold→PENDING
Reviewers can later flip a span's status, change its type, or use Redact All Like This to apply the same decision to every other occurrence of the exact text within the document.
Risk score¶
Each document's risk score is computed from its spans, the batch's per-PII-type
weights, the number of unresolved spans (those still PENDING), and the
document's word count. The exact formula and an example are on the
Risk score reference page.
Approval rules and dual approval¶
A batch may optionally require two approvals from two different reviewers
before a document moves to APPROVED. Admins control this through
rule sets under Admin → Approval Rules.
Rule sets, batches, and the AND/OR model¶
A batch can have zero or more rule sets attached to it. Each rule set is a collection of conditions, and each condition is one of the rules listed below.
- Within a single rule set, conditions are AND-ed. A rule set "fires" only when every condition in it is satisfied at approve time.
- Across rule sets on the same batch, results are OR-ed. A document requires dual approval if any of the batch's rule sets fires.
- A batch with no rule sets behaves as before: single approval.
This lets you express "trigger dual approval when (A AND B) OR (C AND D)" without complex per-rule expressions: each AND group is its own rule set, and adding more rule sets simply adds more independent triggers.
Available rules¶
| Rule | What it checks | Configurable cutoff |
|---|---|---|
| Document contains an SSN, credit card, or passport number | Span types SSN, CREDIT_CARD, PASSPORT_NUMBER |
— |
| Document risk score above a threshold | The document-level risk score | yes (0–1, default 0.9) |
| A rejected span has confidence above a threshold | Any rejected span's confidence |
yes (0–1, default 0.95) |
| Approving reviewer has performed fewer reviews than a threshold | Reviewer's prior approve count | yes (≥0, default 100) |
| Reviewer manually added more than X redactions | Count of manuallyCreated spans on the document |
yes (integer ≥0, default 5) |
| Document contains classified keywords | Case-insensitive substring match against an admin list (e.g. Classified, Proprietary, Secret) | yes (comma-separated list) |
| Dual-approval sampling rate | Per-document random roll persisted at ingest, compared to a configured rate | yes (0.0–1.0, default 0.02 = 2%) |
The reviewer-experience rule short-circuits at queue display time so it can flag documents up front; document-side rules are evaluated against the live document state.
Behavior when a rule set fires¶
When dual approval is required, the document tracks an ordered list of
approver emails (approvedBy) and stays in REVIEW_REQUIRED until enough
approvals have been collected. The same reviewer can never approve a document
twice. The Document Queue's Approvals column shows progress (e.g. 1 of 2).
For worked examples and recommended configurations, see the admin guide: Approval rule sets.
Inbox¶
Each user has a per-user Inbox for system-delivered messages
(inbox_messages collection: id, userId, message, createdAt, read). The
sidebar Inbox link displays an unread-count badge that's available on every
page via a global controller advice. Users mark messages read individually;
admins can deliver messages programmatically via the inbox service.
Search¶
Full text search is the optional capability that indexes every ingested
document in OpenSearch so reviewers can search by content. When enabled,
every document goes into the configured index at ingest time with its
filename, batch, status, and full original text. The Search page in the
sidebar and the GET /api/v1/search endpoint run a full-text match query
against that index. Results outside the caller's group visibility return as
restricted: true with content fields nulled — so the caller knows a result
exists without seeing what it is.
The feature is enabled by default but its connection details (endpoint, optional basic-auth credentials, index name) are configured at runtime under Admin → Settings → Full text search. The full configuration, mapping bootstrap, and what changes when the feature is turned off are documented on the Full text search admin page.
Audit log¶
Every state-changing action — login, logout, batch changes, span and document
updates, settings changes — is written to the audit_log collection in
MongoDB with the actor's email, the resource touched, a timestamp, and
context-specific details. Admins can browse and export the log under
Admin → Audit log.