Running PhilterScope
PhilterScope is a standalone CLI tool for PII redaction auditing and policy optimization. This document explains the available commands and flags.
Note that PhilterScope is intended to be run locally and not over a network. If running over a network, be sure to use SSL/TLS connection to MongoDB and to the PhilterScope UI.
1. Installation
PhilterScope is written in Go. You can build the binary for your platform using the provided Makefile:
make build
This will create philterscope-audit and philterscope-serve binaries in the project root.
2. Commands
PhilterScope provides two primary commands: philterscope-audit for performing audits and philterscope-serve for viewing results.
philterscope-audit
The philterscope-audit command compares raw text files against a "golden dataset" to evaluate redaction quality.
Usage:
PHILTERSCOPE_MONGODB_CONNECTION_STRING=mongodb://localhost:27017/philterscope ./philterscope-audit [flags]
Or without MongoDB:
./philterscope-audit [flags]
Commonly Used Flags:
| Flag | Default | Description |
|---|---|---|
--url |
http://localhost:8080 |
The Philter API URL to use for redacting the raw text files. |
--token |
(none) | The Philter API Token, if required by your Philter server. |
--policy |
default |
The name of the Philter policy to use for redaction. |
--input |
./raw |
The directory containing the raw text files or Philter explain JSON files. |
--golden |
golden.json |
The path to the golden dataset file or directory. |
--output |
. |
The directory where the report.html and report.json will be saved. |
--threshold |
0.5 |
The default recall threshold for policy suggestions (0.0 to 1.0). |
--thresholds |
(none) | Per-entity recall thresholds (e.g., NAME=0.9,SSN=1.0). |
--group |
default |
Assign a group name to the audit for history tracking. |
--ai |
false |
Enable AI-driven policy recommendations (requires Ollama). |
Example:
./philterscope-audit --input ./examples/raw --golden ./examples/golden --output ./examples/ --threshold 0.75 --ai
Thresholds can also be set individually for each entity type:
./philterscope-audit --golden ./examples/golden/ --input ./examples/raw/ --output ./examples/ --threshold 0.75 --thresholds "NAME=0.9,SSN=1.0"
philterscope-serve
The philterscope-serve command launches the Evaluation UI, allowing you to view and interact with the results of a previous audit.
Usage:
./philterscope-serve [flags]
Flags:
| Flag | Default | Description |
|---|---|---|
--report |
report.json |
The JSON report file generated by the audit command. |
--port |
5000 |
The port on which the UI will be served. |
--privacy |
false |
Enable privacy mode (obfuscates PII in UI). |
--id |
(none) | The ID of a specific audit result to view from history. |
Example:
PHILTERSCOPE_MONGODB_CONNECTION_STRING=mongodb://localhost:27017/philterscope ./philterscope-serve --privacy
Or without MongoDB:
./philterscope-serve --report ./examples/report.json --port 5000 --privacy
History and Audit Management
PhilterScope maintains a history of your audit runs. When using philterscope-serve, you can browse previous audits and their results.
- Local Storage: By default, audits are stored in the
.philterscopedirectory in your home folder. - Shared Storage: Use MongoDB for a centralized audit repository across your team.
- Privacy Mode: When
--privacyis enabled, all PII found in the audit results (both expected and actual) is replaced with a cryptographic hash in the UI.
3. Data Formats
PhilterScope is designed to be flexible with your data. It supports multiple formats for both input files and golden datasets.
Input Files
The input directory (--input) can contain:
- Raw Text Files: Simple
.txtfiles that will be sent to the Philter API for redaction. - Philter Explain JSON: If you have already redacted text using Philter's
explainAPI, you can provide the JSON response directly. This allows you to audit pre-redacted data without calling the Philter API again.
Golden Datasets
The golden dataset (--golden) defines the expected redactions. PhilterScope looks for a match in several ways:
- Tagged Text: Wrap PII in your raw text files with tags like
<NAME>John Doe</NAME>. PhilterScope can parse these directly. - JSON Spans: A JSON file that defines the text and the character offsets for each PII entity. This is the recommended format for large datasets.
Example JSON Span format:
{
"text": "My name is John Doe and I live at 123 Main St.",
"labels": [
{
"text": "John Doe",
"start": 11,
"end": 19,
"label": "NAME"
}
]
}
Matching Logic
When you run philterscope-audit, PhilterScope searches for the golden data in this order:
- The path provided by the
--goldenflag (if it's a file). - Matching filenames in the directory provided by
--golden(if it's a directory). <filename>.goldenin the input directory.- A
golden/subdirectory within or next to your input directory. - Inline tags within the input file itself.