Reports¶

The Reports page lives at /reporting and is open to administrators and auditors. It is a read-only summary of activity across every batch the caller can see, scoped to a date range you choose.

The page is broken into the sections below, in render order.

Filters¶

Date range¶

A row at the top of the page sets the time window. Both Start date and End date are required. The default range is the last 30 days; the Last 30 days button resets to that default.

The date range filters documents by their created-at timestamp. The window is inclusive of both endpoints: documents created any time on the end date are included. Documents with no timestamp (legacy data created before the field existed) are kept in totals so they aren't silently dropped.

A status line on the right of the filter bar echoes the active range ("Showing documents created from … to …").

Domain¶

The page also accepts an optional domain query parameter — repeat it to pick more than one domain (e.g. ?domain=Healthcare&domain=Legal). There is no UI control for it; set it on the URL when you want to compare a single domain or a small subset against the rest of the deployment. With no parameter the page covers every domain.

Visibility scope¶

Administrators see every batch, document, span, and audit entry across the whole deployment. Auditors see the same global scope read-only — Roles and permissions covers the boundary.

Reviewers (the USER role) cannot reach the Reports page; the link does not appear in their sidebar and the URL is rejected at the security filter.

Top KPI cards¶

A row of four high-level counters across the top of the page:

Tile	What it counts
Batches	Total visible batches in the date range. Sub-line splits open vs. closed.
Documents	Total documents across those batches.
Needs review	Documents in `REVIEW_REQUIRED` or `AUDIT_REQUIRED` status (yellow highlight).
Auto-approved	Documents whose risk score is at or below their batch's Document Threshold and that haven't been explicitly approved or rejected (emerald highlight).

Span KPI cards¶

A second row of four counters focused on spans (the individual PII detections inside documents):

Tile	What it counts
Spans accepted	PII detections approved by a reviewer.
Spans rejected	False positives or incorrect detections rejected by a reviewer.
Manually created	Spans the redactor missed and a reviewer added by hand.
Edit rate	`manual ÷ (accepted + rejected + manual)`. The fraction of decisions that required a manual addition. Higher numbers mean the redactor is missing PII the reviewers have to backfill.

Documents by status¶

Two cards side by side:

Documents by status — a count for every known status (PENDING, REVIEW_REQUIRED, AUDIT_REQUIRED, AUTO_APPROVED, APPROVED, REJECTED, FAILED, FINALIZED) along with a pie chart. Status pills use the same colors as the rest of the UI: green for approved, red for rejected/failed, yellow/amber for documents that need attention, blue for PENDING.
Average risk score — the mean risk score across every document in the current scope. Range is 0.000 (no risk) to 1.000 (clamped maximum). See Risk score for the formula.

Per domain¶

Aggregated by the Domain field on each batch (one row per domain value, plus a (none) row for batches with no domain set):

Column	Meaning
Domain	The domain string.
Batches	Number of batches in this domain in scope.
Docs	Total documents in those batches.
Accepted	Spans accepted by reviewers across those batches.
Rejected	Spans rejected.
Manual	Spans manually added by reviewers.
Edit rate	`manual ÷ (accepted + rejected + manual)` for the domain. Edit rate is colored amber at 10% and red at 20% to flag domains where the redactor leans on reviewers more than expected.

Sorted by edit rate descending so problem domains float to the top.

Reviews per user¶

Approvals and rejections each user recorded in the selected date range, derived from the audit log:

Column	Meaning
User	The user's email (or `(unknown)` for legacy entries with no actor).
Approvals	`DOCUMENT_APPROVAL` events the user produced.
Rejections	`DOCUMENT_STATUS_CHANGE` events the user produced where the new status was `REJECTED`.
Total reviews	Approvals + Rejections.

Sorted by total decisions, highest first. Users who took no action in the range are not listed.

Per Philter / Policy¶

Aggregated across all batches sharing a Philter instance and policy combination. The same shape as the per-domain table:

Column	Meaning
Philter	Friendly name of the Philter instance (or `Embedded Philter` when no external Philter is configured).
Policy	Phileas policy name on that Philter, or `(no policy)`.
Batches	Batches in scope using this combination.
Docs	Documents in those batches.
Accepted	Spans accepted by reviewers.
Rejected	Spans rejected.
Manual	Spans manually added.
Edit rate	Same as per-domain. Same color thresholds.

Sorted by edit rate descending. High edit rate flags policies that are either letting too much through or false-positiving so reviewers are doing extra manual work — a candidate to retune the policy or the per-PII-type weights on the batch.

Documents by batch and priority¶

Document counts grouped by batch and priority (High / Normal / Low), broken out by status. Sorted by batch name, then by priority highest-first.

Column	Meaning
Batch	Batch name.
Priority	`High` (red), `Normal` (gray), or `Low` (blue) — see Priority.
Total	Total documents in this (batch, priority) cell.
One column per status	Count for each of `PENDING`, `REVIEW_REQUIRED`, `AUDIT_REQUIRED`, `AUTO_APPROVED`, `APPROVED`, `REJECTED`, `FAILED`, `FINALIZED`. Zero counts are dimmed.
Not yet approved	`PENDING` + `REVIEW_REQUIRED` + `AUDIT_REQUIRED` — documents that still need a human decision. Highlighted amber when non-zero.

Use the Not yet approved column to see, for example, how many High-priority documents in a batch still need a reviewer.

Per-batch breakdown¶

The bottom-of-page comprehensive table — one row per batch in scope:

Column	Meaning
Batch	Batch name. A `Closed` pill appears when the batch has been closed.
Philter / Policy	The Philter instance and policy this batch uses.
Docs	Document count.
Accepted	Spans accepted by reviewers.
Rejected	Spans rejected.
Manual	Spans manually added.
Edit rate	`manual ÷ (accepted + rejected + manual)`. Color thresholds as above.
Auto-approved	Documents auto-approved (below the document threshold and never user-decided).
Avg risk	Mean risk score across this batch's documents.

Sorted by document count descending, with a name tie-break.

Inter-Annotator Agreement (IAA)¶

The bottom card on the page, rendered for every batch that has Blind Double Review enabled. Batches that do not have the feature turned on are not listed.

For each such batch the report shows:

Column	Meaning
Batch	Batch name.
Documents	Number of documents in the batch that have completed both first and second reviews.
Tokens	Total tokens compared across those documents.
Cohen's Kappa	Pooled token-level Cohen's Kappa between the two reviewers (see method below).

If a batch has no doubly-reviewed documents yet, the Cohen's Kappa column reads — not enough data —. As soon as at least one document has been reviewed by both reviewers, a numeric kappa is computed.

How the score is computed¶

The score is token-level Cohen's Kappa:

For every double-reviewed document with snapshots from both reviewers, the original document text is split on whitespace into tokens (one token = one whitespace-delimited word).
Each token is given a binary label per reviewer:
- PII — the token's character range overlaps any APPROVED span the reviewer left at the moment they completed their review.
- O — otherwise.
Partial overlap counts as PII. This is conservative: a token that touches a PII span at all is treated as PII for the agreement calculation. 3. The per-token decisions are pooled across every double-reviewed document in the batch into a single 2x2 confusion matrix (both PII / first-only / second-only / both O). 4. Cohen's Kappa is computed once per batch from that pooled matrix.

Pooling at the batch level (rather than averaging per-document kappas) gives a single defensible score even when individual documents are short or lopsided in their PII vs. O distribution.

How to read the score¶

The score is colored:

Score range	Color	Reading
≥ 0.80	Green	Substantial-to-near-perfect agreement. Reviewers are calibrated; the redaction policy is being applied consistently.
0.60–0.79	Amber	Moderate agreement. Worth spot-checking the disagreements; usually points at edge cases in the policy.
< 0.60	Red	Low agreement. Indicates a real training gap or an ambiguous rule in the redaction policy that needs to be tightened.

A kappa of 0.9 means the Blind Double Review process is acting as a sanity check on a healthy pipeline; the admin can sleep soundly. A kappa of 0.5 is a signal that the process has surfaced a meaningful issue — either reviewers need additional guidance, or the redaction policy itself has ambiguous cases the team needs to resolve.

Edge cases¶

Both reviewers labeled every token as O. Cohen's Kappa is degenerate in this case (the standard formula evaluates to 0/0). Arbiter reports 1.000 by convention, reflecting unanimous agreement.
Only one reviewer reviewed. The document is excluded from the pooled matrix until the second reviewer completes their review.
Document text is empty. The document is excluded.
A reviewer left no APPROVED spans. Their labels are all O for that document, which is valid input to the kappa calculation.