CLI¶
go-phileas includes a command-line tool, phileas, that redacts sensitive information from text files using a JSON or YAML policy.
Installation¶
Build the binary from the repository root:
Or directly with go build:
Or install it directly with go install:
Usage¶
phileas --policy <policy.json> --input <input.txt> [--context <context>]
phileas --policy <policy.json> --input <input.txt> --evaluate --spans <spans.json> [--context <context>]
phileas --policy <policy.json|policy.yaml> --validate
Flags¶
| Flag | Required | Description |
|---|---|---|
--policy |
Yes | Path to the JSON or YAML policy file |
--input |
Yes (unless --validate is set) |
Path to the input text file to redact |
--context |
No | Context name to associate with the filter operation. If omitted, context checks are skipped. |
--evaluate |
No | Enable evaluation mode. Compares the spans identified by Phileas against a set of ground-truth spans and prints precision, recall, and F1. |
--spans |
When --evaluate is set |
Path to a JSON file containing ground-truth spans. Required when --evaluate is set. |
--validate |
No | Validate the policy file and exit. Requires --policy; --input is not needed. |
In standard mode the redacted text is written to standard output. In evaluation mode the precision, recall, and F1 metrics are written to standard output instead. Errors are written to standard error and the process exits with a non-zero status code.
Example¶
Create a policy file policy.json:
{
"identifiers": {
"ssn": {
"ssnFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
},
"emailAddress": {
"emailAddressFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[EMAIL]"}]
}
}
}
Create an input file input.txt:
Run the CLI:
Output:
Using a context name¶
Passing --context associates the filter operation with a named context. This is useful when the same pieces of PII should be treated consistently across multiple calls within the same logical group (e.g., all records in a case file).
When --context is omitted the CLI passes an empty string, effectively skipping any context-based grouping.
Redirecting output¶
Because the redacted text goes to stdout you can pipe it to other tools or redirect it to a file:
# Save redacted output to a file
phileas --policy policy.json --input input.txt > redacted.txt
# Pipe into another command
phileas --policy policy.json --input input.txt | wc -c
Policy format¶
The --policy flag accepts any valid go-phileas JSON or YAML policy. Files with a .yaml or .yml extension are parsed as YAML; all other files are parsed as JSON. See Policies for the full schema and all available identifier options.
Validating a policy¶
Use the --validate flag to check whether a policy file is well-formed without redacting any text. The format (JSON or YAML) is detected automatically from the file extension.
If the policy is valid the CLI prints Policy is valid. and exits with status 0. If the policy contains formatting errors the CLI prints the error to standard error and exits with a non-zero status code.
# Valid policy
$ phileas --policy policy.json --validate
Policy is valid.
# Invalid policy
$ phileas --policy bad-policy.json --validate
policy is not valid: invalid character 'b' looking for beginning of object key string
Evaluating filter performance¶
The --evaluate flag switches the CLI into evaluation mode. Instead of printing redacted text, Phileas compares the spans it finds against a set of human-labeled (ground-truth) spans and prints precision, recall, and F1.
This is useful for:
- Benchmarking how well a policy detects a specific type of sensitive information.
- Tuning policy parameters and measuring the impact on detection quality.
- Regression testing after policy changes.
Ground-truth spans file format¶
The --spans file must be a JSON array of span objects. Only characterStart and characterEnd are used for matching — all other fields are optional.
A span in the predicted output is counted as a true positive when a ground-truth span with the same characterStart and characterEnd exists. Predicted spans with no matching ground-truth span are false positives; ground-truth spans that were not predicted are false negatives.
Evaluation example¶
Given input.txt:
And spans.json (the SSN at positions 10–21 and the email at 38–54):
And policy.json:
Run:
Output: