Policies

A policy is the configuration object that tells phileas-python what to detect and what to do with each match. Policies are expressed as Python dicts, JSON strings, or YAML strings and are loaded into a Policy object before being passed to FilterService.

Policy structure

name: my-policy
identifiers:
  emailAddress:
    enabled: true
    emailAddressFilterStrategies:
      - strategy: REDACT
        redactionFormat: "{{{REDACTED-%t}}}"
    ignored:
      - value-to-skip
ignored:
  - global-term-to-skip
ignoredPatterns:
  - "\\d{3}-555-\\d{4}"

Field	Type	Description
`name`	string	A human-readable name for the policy
`identifiers`	object	Map of filter keys to their configuration
`ignored`	array of strings	Terms that are never replaced, regardless of the filter that matched them
`ignoredPatterns`	array of regex strings	Regex patterns whose full matches are never replaced

Loading a policy

from phileas.policy.policy import Policy

# From a Python dict
policy = Policy.from_dict({...})

# From a JSON string
policy = Policy.from_json('{"name": "p", "identifiers": {...}}')

# From a YAML string
policy = Policy.from_yaml("name: p\nidentifiers:\n  ...")

# Serialise back
json_str = policy.to_json()
yaml_str = policy.to_yaml()
d = policy.to_dict()

Enabling and disabling filters

Every filter is disabled by default. To enable a filter, include its key in identifiers. To explicitly disable a filter that would otherwise be enabled, set "enabled": false:

policy = Policy.from_dict({
    "name": "selective",
    "identifiers": {
        "emailAddress": {
            "emailAddressFilterStrategies": [{"strategy": "REDACT"}]
        },
        "url": {"enabled": False}   # explicitly disabled
    }
})

Filter strategies

Each enabled filter requires at least one strategy entry in its *FilterStrategies array. The first strategy is applied to every match.

Available strategies

Strategy	Description	Example output
`REDACT`	Replace with a redaction tag	`{{{REDACTED-email-address}}}`
`MASK`	Replace every character with `maskCharacter` (default `*`)	`*@***.*`
`STATIC_REPLACE`	Replace with a fixed string	`[REMOVED]`
`HASH_SHA256_REPLACE`	Replace with the SHA-256 hex digest of the matched value	`a665a4592...`
`LAST_4`	Mask all but the last 4 characters	`****6789`
`SAME`	Leave the value unchanged (identify-only mode)	`123-45-6789`
`TRUNCATE`	Keep only the first 4 characters	`john***`
`ABBREVIATE`	Replace with the initials of each word	`J. S.`
`RANDOM_REPLACE`	Replace with a randomly generated value of the same type	`jane@domain.org`
`SHIFT_DATE`	Shift a detected date by a configurable number of years/months/days	`01/20/1995`

Strategy options

strategy: REDACT
redactionFormat: "{{{REDACTED-%t}}}"
staticReplacement: "[REMOVED]"
maskCharacter: "*"
maskLength: SAME
condition: ""
shiftYears: 0
shiftMonths: 0
shiftDays: 0

redactionFormat — used by REDACT. The placeholder %t is replaced with the filter type name (e.g. email-address).
staticReplacement — used by STATIC_REPLACE.
maskCharacter — character used by MASK (default: *).
shiftYears / shiftMonths / shiftDays — offsets used by SHIFT_DATE.
condition — optional expression that must evaluate to true for this strategy to be applied. See Conditions below.

Examples

# Redact with a custom format
{"strategy": "REDACT", "redactionFormat": "[PII-%t]"}

# Mask with a custom character
{"strategy": "MASK", "maskCharacter": "X"}

# Replace with a fixed string
{"strategy": "STATIC_REPLACE", "staticReplacement": "[REMOVED]"}

# Shift a date forward by 2 years and 3 days
{"strategy": "SHIFT_DATE", "shiftYears": 2, "shiftDays": 3}

Conditions

A condition expression is an optional string attached to a strategy that gates its application. The strategy is only applied when the condition evaluates to true. When multiple strategies are listed, the first one whose condition is satisfied is used.

Multiple sub-expressions may be combined with and:

{"strategy": "REDACT", "condition": 'token startswith "4" and confidence >= 0.9'}

Supported condition expressions

Expression	Description
`token == "value"`	Matched text equals `value` (case-sensitive)
`token != "value"`	Matched text does not equal `value`
`token startswith "prefix"`	Matched text starts with `prefix`
`token endswith "suffix"`	Matched text ends with `suffix`
`token contains "substring"`	Matched text contains `substring`
`context == "value"`	Current context equals `value`
`context != "value"`	Current context does not equal `value`
`confidence <op> 0.9`	Match confidence compared to a threshold (`>`, `<`, `>=`, `<=`, `==`, `!=`)
`population <op> 20000`	ZIP code population compared to a threshold — see Population condition

Population condition

The population condition is specific to the zipCode filter. It evaluates to true when the 2020 US Census population of the matched ZIP code satisfies the given comparison. ZIP codes not found in the dataset evaluate to false.

Supported operators: <, >, <=, >=, ==, !=.

# Only redact ZIP codes with a population below 20,000
{
    "zipCode": {
        "zipCodeFilterStrategies": [
            {"strategy": "REDACT", "condition": "population < 20000"}
        ]
    }
}

# Redact small ZIP codes; leave large ones unchanged (identify-only)
s_small = {"strategy": "REDACT",  "condition": "population < 20000"}
s_large = {"strategy": "SAME",    "condition": "population >= 20000"}

{
    "zipCode": {
        "zipCodeFilterStrategies": [s_small, s_large]
    }
}

The condition can also be combined with other expressions using and:

{"strategy": "REDACT", "condition": 'population < 20000 and context == "medical"'}

Ignored terms

Use ignored on an individual filter to skip specific values:

{
    "emailAddress": {
        "emailAddressFilterStrategies": [{"strategy": "REDACT"}],
        "ignored": ["noreply@internal.com", "admin@internal.com"]
    }
}

Use the top-level ignored list to skip terms regardless of which filter matched them, and ignoredPatterns for regex-based exclusions:

policy = Policy.from_dict({
    "name": "allow-list",
    "identifiers": {
        "phoneNumber": {
            "phoneNumberFilterStrategies": [{"strategy": "REDACT"}]
        }
    },
    "ignored": ["555-000-0000"],
    "ignoredPatterns": ["\\d{3}-555-\\d{4}"]   # ignore 555-xxx numbers
})

ph-eye integration

ph-eye is a standalone NER service that phileas-python can call to detect named entities such as person names. Alternatively, phileas-python can perform local inference using GLiNER if modelPath and vocabPath are provided.

Remote Inference (HTTP)

To use a remote ph-eye service, provide the endpoint URL:

policy = Policy.from_dict({
    "name": "ner-policy",
    "identifiers": {
        "phEye": [
            {
                "endpoint": "http://localhost:8080",
                "bearerToken": "secret",
                "labels": ["PERSON", "LOCATION"],
                "thresholds": {"PERSON": 0.8},
                "phEyeFilterStrategies": [{"strategy": "REDACT"}]
            }
        ]
    }
})

Local Inference (GLiNER)

To use local inference, provide the modelPath and vocabPath. If the modelPath ends with .onnx, the ONNX Runtime will be used.

policy = Policy.from_dict({
    "name": "local-ner-policy",
    "identifiers": {
        "phEye": [
            {
                "modelPath": "/path/to/gliner_model.bin",
                "vocabPath": "/path/to/vocab.txt",
                "labels": ["PERSON"],
                "phEyeFilterStrategies": [{"strategy": "REDACT"}]
            }
        ]
    }
})

Option	Type	Default	Description
`endpoint`	string	`""`	Base URL of the ph-eye service (for remote inference)
`bearerToken`	string	`""`	Optional Bearer token for authentication (for remote inference)
`modelPath`	string	`""`	Path to the local GLiNER model (e.g. `gliner_model.bin` or `gliner_model.onnx`)
`vocabPath`	string	`""`	Path to the vocabulary file required by GLiNER
`timeout`	int	`30`	Request timeout in seconds (for remote inference)
`labels`	list of strings	`["PERSON"]`	NER label types to process
`thresholds`	object	`{}`	Minimum confidence per label, e.g. `{"PERSON": 0.9}`
`removePunctuation`	bool	`false`	Strip punctuation from entity text before replacement

Dictionary filter

The dictionaries filter matches terms from a user-supplied list anywhere in the text. It is useful for redacting known names, keywords, or any other fixed vocabulary.

from phileas.policy.policy import Policy
from phileas.services.filter_service import FilterService

policy = Policy.from_dict({
    "name": "dictionary-policy",
    "identifiers": {
        "dictionaries": [
            {
                "terms": ["John", "Jane Smith", "classified"],
                "dictionaryFilterStrategies": [{"strategy": "REDACT"}]
            }
        ]
    }
})

service = FilterService()
result = service.filter(
    policy, "app", "doc-1",
    "John called Jane Smith about the classified project."
)
print(result.filtered_text)
# {{{REDACTED-dictionary}}} called {{{REDACTED-dictionary}}} about the {{{REDACTED-dictionary}}} project.

Like phEye, dictionaries is a list — you can include multiple independent dictionaries in a single policy:

from phileas.policy.policy import Policy
from phileas.services.filter_service import FilterService

policy = Policy.from_dict({
    "name": "multi-dict-policy",
    "identifiers": {
        "dictionaries": [
            {
                "terms": ["Alice", "Bob"],
                "dictionaryFilterStrategies": [
                    {"strategy": "STATIC_REPLACE", "staticReplacement": "[PERSON]"}
                ]
            },
            {
                "terms": ["secret", "classified"],
                "dictionaryFilterStrategies": [{"strategy": "REDACT"}]
            }
        ]
    }
})

service = FilterService()
result = service.filter(
    policy, "app", "doc-2",
    "Alice told Bob about the secret project marked classified."
)
print(result.filtered_text)
# [PERSON] told [PERSON] about the {{{REDACTED-dictionary}}} project marked {{{REDACTED-dictionary}}}.

Option	Type	Default	Description
`enabled`	bool	`true`	Whether this dictionary is active
`terms`	array of strings	`[]`	The list of terms to detect (case-insensitive, whole-word)
`dictionaryFilterStrategies`	array	`[{"strategy": "REDACT"}]`	Replacement strategies
`ignored`	array of strings	`[]`	Terms to skip even if present in `terms`