Policies
A policy is the configuration object that tells phileas-python what to detect and what to do with each match. Policies are expressed as Python dicts, JSON strings, or YAML strings and are loaded into a Policy object before being passed to FilterService.
Policy structure
name: my-policy
identifiers:
emailAddress:
enabled: true
emailAddressFilterStrategies:
- strategy: REDACT
redactionFormat: "{{{REDACTED-%t}}}"
ignored:
- value-to-skip
ignored:
- global-term-to-skip
ignoredPatterns:
- "\\d{3}-555-\\d{4}"
| Field | Type | Description |
|---|---|---|
name |
string | A human-readable name for the policy |
identifiers |
object | Map of filter keys to their configuration |
ignored |
array of strings | Terms that are never replaced, regardless of the filter that matched them |
ignoredPatterns |
array of regex strings | Regex patterns whose full matches are never replaced |
Loading a policy
from phileas.policy.policy import Policy
# From a Python dict
policy = Policy.from_dict({...})
# From a JSON string
policy = Policy.from_json('{"name": "p", "identifiers": {...}}')
# From a YAML string
policy = Policy.from_yaml("name: p\nidentifiers:\n ...")
# Serialise back
json_str = policy.to_json()
yaml_str = policy.to_yaml()
d = policy.to_dict()
Enabling and disabling filters
Every filter is disabled by default. To enable a filter, include its key in identifiers. To explicitly disable a filter that would otherwise be enabled, set "enabled": false:
policy = Policy.from_dict({
"name": "selective",
"identifiers": {
"emailAddress": {
"emailAddressFilterStrategies": [{"strategy": "REDACT"}]
},
"url": {"enabled": False} # explicitly disabled
}
})
Filter strategies
Each enabled filter requires at least one strategy entry in its *FilterStrategies array. The first strategy is applied to every match.
Available strategies
| Strategy | Description | Example output |
|---|---|---|
REDACT |
Replace with a redaction tag | {{{REDACTED-email-address}}} |
MASK |
Replace every character with maskCharacter (default *) |
***@*******.*** |
STATIC_REPLACE |
Replace with a fixed string | [REMOVED] |
HASH_SHA256_REPLACE |
Replace with the SHA-256 hex digest of the matched value | a665a4592... |
LAST_4 |
Mask all but the last 4 characters | ****6789 |
SAME |
Leave the value unchanged (identify-only mode) | 123-45-6789 |
TRUNCATE |
Keep only the first 4 characters | john*** |
ABBREVIATE |
Replace with the initials of each word | J. S. |
RANDOM_REPLACE |
Replace with a randomly generated value of the same type | jane@domain.org |
SHIFT_DATE |
Shift a detected date by a configurable number of years/months/days | 01/20/1995 |
Strategy options
strategy: REDACT
redactionFormat: "{{{REDACTED-%t}}}"
staticReplacement: "[REMOVED]"
maskCharacter: "*"
maskLength: SAME
condition: ""
shiftYears: 0
shiftMonths: 0
shiftDays: 0
redactionFormat— used byREDACT. The placeholder%tis replaced with the filter type name (e.g.email-address).staticReplacement— used bySTATIC_REPLACE.maskCharacter— character used byMASK(default:*).shiftYears/shiftMonths/shiftDays— offsets used bySHIFT_DATE.condition— optional expression that must evaluate totruefor this strategy to be applied. See Conditions below.
Examples
# Redact with a custom format
{"strategy": "REDACT", "redactionFormat": "[PII-%t]"}
# Mask with a custom character
{"strategy": "MASK", "maskCharacter": "X"}
# Replace with a fixed string
{"strategy": "STATIC_REPLACE", "staticReplacement": "[REMOVED]"}
# Shift a date forward by 2 years and 3 days
{"strategy": "SHIFT_DATE", "shiftYears": 2, "shiftDays": 3}
Conditions
A condition expression is an optional string attached to a strategy that gates its application. The strategy is only applied when the condition evaluates to true. When multiple strategies are listed, the first one whose condition is satisfied is used.
Multiple sub-expressions may be combined with and:
{"strategy": "REDACT", "condition": 'token startswith "4" and confidence >= 0.9'}
Supported condition expressions
| Expression | Description |
|---|---|
token == "value" |
Matched text equals value (case-sensitive) |
token != "value" |
Matched text does not equal value |
token startswith "prefix" |
Matched text starts with prefix |
token endswith "suffix" |
Matched text ends with suffix |
token contains "substring" |
Matched text contains substring |
context == "value" |
Current context equals value |
context != "value" |
Current context does not equal value |
confidence <op> 0.9 |
Match confidence compared to a threshold (>, <, >=, <=, ==, !=) |
population <op> 20000 |
ZIP code population compared to a threshold — see Population condition |
Population condition
The population condition is specific to the zipCode filter. It evaluates to true when the 2020 US Census population of the matched ZIP code satisfies the given comparison. ZIP codes not found in the dataset evaluate to false.
Supported operators: <, >, <=, >=, ==, !=.
# Only redact ZIP codes with a population below 20,000
{
"zipCode": {
"zipCodeFilterStrategies": [
{"strategy": "REDACT", "condition": "population < 20000"}
]
}
}
# Redact small ZIP codes; leave large ones unchanged (identify-only)
s_small = {"strategy": "REDACT", "condition": "population < 20000"}
s_large = {"strategy": "SAME", "condition": "population >= 20000"}
{
"zipCode": {
"zipCodeFilterStrategies": [s_small, s_large]
}
}
The condition can also be combined with other expressions using and:
{"strategy": "REDACT", "condition": 'population < 20000 and context == "medical"'}
Ignored terms
Use ignored on an individual filter to skip specific values:
{
"emailAddress": {
"emailAddressFilterStrategies": [{"strategy": "REDACT"}],
"ignored": ["noreply@internal.com", "admin@internal.com"]
}
}
Use the top-level ignored list to skip terms regardless of which filter matched them, and ignoredPatterns for regex-based exclusions:
policy = Policy.from_dict({
"name": "allow-list",
"identifiers": {
"phoneNumber": {
"phoneNumberFilterStrategies": [{"strategy": "REDACT"}]
}
},
"ignored": ["555-000-0000"],
"ignoredPatterns": ["\\d{3}-555-\\d{4}"] # ignore 555-xxx numbers
})
ph-eye integration
ph-eye is a standalone NER service that phileas-python can call to detect named entities such as person names. Alternatively, phileas-python can perform local inference using GLiNER if modelPath and vocabPath are provided.
Remote Inference (HTTP)
To use a remote ph-eye service, provide the endpoint URL:
policy = Policy.from_dict({
"name": "ner-policy",
"identifiers": {
"phEye": [
{
"endpoint": "http://localhost:8080",
"bearerToken": "secret",
"labels": ["PERSON", "LOCATION"],
"thresholds": {"PERSON": 0.8},
"phEyeFilterStrategies": [{"strategy": "REDACT"}]
}
]
}
})
Local Inference (GLiNER)
To use local inference, provide the modelPath and vocabPath. If the modelPath ends with .onnx, the ONNX Runtime will be used.
policy = Policy.from_dict({
"name": "local-ner-policy",
"identifiers": {
"phEye": [
{
"modelPath": "/path/to/gliner_model.bin",
"vocabPath": "/path/to/vocab.txt",
"labels": ["PERSON"],
"phEyeFilterStrategies": [{"strategy": "REDACT"}]
}
]
}
})
| Option | Type | Default | Description |
|---|---|---|---|
endpoint |
string | "" |
Base URL of the ph-eye service (for remote inference) |
bearerToken |
string | "" |
Optional Bearer token for authentication (for remote inference) |
modelPath |
string | "" |
Path to the local GLiNER model (e.g. gliner_model.bin or gliner_model.onnx) |
vocabPath |
string | "" |
Path to the vocabulary file required by GLiNER |
timeout |
int | 30 |
Request timeout in seconds (for remote inference) |
labels |
list of strings | ["PERSON"] |
NER label types to process |
thresholds |
object | {} |
Minimum confidence per label, e.g. {"PERSON": 0.9} |
removePunctuation |
bool | false |
Strip punctuation from entity text before replacement |
Dictionary filter
The dictionaries filter matches terms from a user-supplied list anywhere in the text. It is useful for redacting known names, keywords, or any other fixed vocabulary.
from phileas.policy.policy import Policy
from phileas.services.filter_service import FilterService
policy = Policy.from_dict({
"name": "dictionary-policy",
"identifiers": {
"dictionaries": [
{
"terms": ["John", "Jane Smith", "classified"],
"dictionaryFilterStrategies": [{"strategy": "REDACT"}]
}
]
}
})
service = FilterService()
result = service.filter(
policy, "app", "doc-1",
"John called Jane Smith about the classified project."
)
print(result.filtered_text)
# {{{REDACTED-dictionary}}} called {{{REDACTED-dictionary}}} about the {{{REDACTED-dictionary}}} project.
Like phEye, dictionaries is a list — you can include multiple independent dictionaries in a single policy:
from phileas.policy.policy import Policy
from phileas.services.filter_service import FilterService
policy = Policy.from_dict({
"name": "multi-dict-policy",
"identifiers": {
"dictionaries": [
{
"terms": ["Alice", "Bob"],
"dictionaryFilterStrategies": [
{"strategy": "STATIC_REPLACE", "staticReplacement": "[PERSON]"}
]
},
{
"terms": ["secret", "classified"],
"dictionaryFilterStrategies": [{"strategy": "REDACT"}]
}
]
}
})
service = FilterService()
result = service.filter(
policy, "app", "doc-2",
"Alice told Bob about the secret project marked classified."
)
print(result.filtered_text)
# [PERSON] told [PERSON] about the {{{REDACTED-dictionary}}} project marked {{{REDACTED-dictionary}}}.
| Option | Type | Default | Description |
|---|---|---|---|
enabled |
bool | true |
Whether this dictionary is active |
terms |
array of strings | [] |
The list of terms to detect (case-insensitive, whole-word) |
dictionaryFilterStrategies |
array | [{"strategy": "REDACT"}] |
Replacement strategies |
ignored |
array of strings | [] |
Terms to skip even if present in terms |