API Reference

FilterService

phileas.services.filter_service.FilterService

The main entry point for filtering text. FilterService is stateless; a single instance can be reused across multiple calls.

from phileas.services.filter_service import FilterService
from phileas.policy.policy import Policy

# Create with default in-memory context service
service = FilterService()

# Or provide a custom context service
from phileas.services.context_service import InMemoryContextService
ctx_svc = InMemoryContextService()
service = FilterService(context_service=ctx_svc)

Constructor

FilterService(context_service=None)

Parameters

| Parameter | Type | Default | Description | |---|---|---| | context_service | AbstractContextService or None | None | Context service implementation for managing referential integrity. If None, an InMemoryContextService is created automatically. |

filter(policy, context, document_id, text)

Apply the policy to the given text and return a FilterResult.

from phileas.services.filter_service import FilterService
from phileas.policy.policy import Policy

policy = Policy.from_dict({
    "name": "example",
    "identifiers": {
        "emailAddress": {
            "emailAddressFilterStrategies": [{"strategy": "REDACT"}]
        }
    }
})

service = FilterService()
result = service.filter(
    policy=policy,
    context="my-app",
    document_id="doc-001",
    text="Contact john@example.com.",
)

print(result.filtered_text)
# Contact {{{REDACTED-email-address}}}.

Parameters

Parameter Type Description
policy Policy The policy to apply
context str An arbitrary string identifying the caller or application (e.g. user name, service name). Stored in each returned Span.
document_id str A unique identifier for the document being filtered. Stored in the returned FilterResult.
text str The text to filter

ReturnsFilterResult


Policy

phileas.policy.policy.Policy

Represents a de-identification policy.

from phileas.policy.policy import Policy

Constructors

Policy.from_dict(data)

Create a Policy from a Python dictionary.

policy = Policy.from_dict({
    "name": "my-policy",
    "identifiers": {
        "emailAddress": {
            "emailAddressFilterStrategies": [{"strategy": "REDACT"}]
        }
    }
})

Policy.from_json(json_str)

Create a Policy from a JSON string.

policy = Policy.from_json('{"name": "p", "identifiers": {...}}')

Policy.from_yaml(yaml_str)

Create a Policy from a YAML string.

policy = Policy.from_yaml("name: p\nidentifiers:\n  ...")

Methods

to_dict()

Serialise the policy to a Python dict.

to_json()

Serialise the policy to a JSON string (pretty-printed).

to_yaml()

Serialise the policy to a YAML string.

Attributes

Attribute Type Description
name str Policy name
identifiers Identifiers Filter configuration
ignored list[str] Global ignore list
ignored_patterns list[str] Global ignore patterns (regex)

FilterResult

phileas.models.filter_result.FilterResult

Returned by FilterService.filter(). Contains the filtered text and metadata for every match.

from phileas.models.filter_result import FilterResult

Attributes

Attribute Type Description
filtered_text str The input text with sensitive values replaced
spans list[Span] One Span per detected piece of sensitive information
context str The context value passed to filter()
document_id str The document_id value passed to filter()

Span

phileas.models.span.Span

Describes a single detected piece of sensitive information.

from phileas.models.span import Span

Attributes

Attribute Type Description
character_start int Start index (inclusive) of the match in the original text
character_end int End index (exclusive) of the match in the original text
filter_type str The type of PII detected (e.g. "email-address", "ssn")
text str The original matched text
replacement str The replacement value applied to the text
confidence float Confidence score in the range 0.0–1.0
ignored bool True if the span was matched but not replaced (because it appeared in an ignore list or matched an ignored pattern)
context str The context value from the filter() call

Methods

overlaps(other)

Return True if this span overlaps with other.

Span.drop_overlapping_spans(spans) (static)

Given a list of spans, remove overlapping ones, keeping the span with the highest confidence score. Returns a new list sorted by character_start.


FilterStrategy

phileas.policy.filter_strategy.FilterStrategy

Holds the replacement configuration for a single filter strategy entry.

Constants

Constant Value
REDACT "REDACT"
MASK "MASK"
STATIC_REPLACE "STATIC_REPLACE"
HASH_SHA256_REPLACE "HASH_SHA256_REPLACE"
LAST_4 "LAST_4"
SAME "SAME"
TRUNCATE "TRUNCATE"
ABBREVIATE "ABBREVIATE"
RANDOM_REPLACE "RANDOM_REPLACE"
SHIFT_DATE "SHIFT_DATE"

Constructor

FilterStrategy(
    strategy="REDACT",
    redaction_format="{{{REDACTED-%t}}}",
    static_replacement="",
    mask_character="*",
    mask_length="SAME",
    condition="",
    shift_years=0,
    shift_months=0,
    shift_days=0,
)

Methods

get_replacement(filter_type, token)

Return the replacement string for token based on the configured strategy.

FilterStrategy.from_dict(data) (class method)

Create a FilterStrategy from a dict such as {"strategy": "REDACT", "redactionFormat": "..."}.

to_dict()

Serialise to a dict.


AbstractContextService

phileas.services.context_service.AbstractContextService

Abstract base class for context service implementations. Subclass this to provide a custom backend (e.g., Redis, database, etc.).

from phileas.services.context_service import AbstractContextService
from phileas.services.filter_service import FilterService
from phileas.policy.policy import Policy

class RedisContextService(AbstractContextService):
    """Example custom context service using Redis."""

    def __init__(self, redis_client):
        self.redis = redis_client

    def put(self, context: str, token: str, replacement: str) -> None:
        """Store token -> replacement mapping in Redis."""
        key = f"phileas:{context}:{token}"
        self.redis.set(key, replacement)

    def get(self, context: str, token: str) -> str | None:
        """Retrieve replacement from Redis, or None if not found."""
        key = f"phileas:{context}:{token}"
        value = self.redis.get(key)
        return value.decode('utf-8') if value else None

    def contains(self, context: str, token: str) -> bool:
        """Check if token exists in Redis."""
        key = f"phileas:{context}:{token}"
        return self.redis.exists(key) > 0

# Usage example (requires redis package)
# import redis
# redis_client = redis.Redis(host='localhost', port=6379, db=0)
# ctx_svc = RedisContextService(redis_client)
# service = FilterService(context_service=ctx_svc)

Methods

Method Signature Description
put (context, token, replacement) -> None Store a replacement value for a token under the given context
get (context, token) -> str \| None Return the stored replacement, or None if not found
contains (context, token) -> bool Return True if a replacement exists for the token in the given context

InMemoryContextService

phileas.services.context_service.InMemoryContextService

Default implementation of AbstractContextService backed by a dict[str, dict[str, str]]. Suitable for single-process, in-memory use.

from phileas.services.context_service import InMemoryContextService
from phileas.services.filter_service import FilterService
from phileas.policy.policy import Policy

# Create and pre-populate the context service
ctx_svc = InMemoryContextService()
ctx_svc.put("patient-123", "john@example.com", "EMAIL-001")
ctx_svc.put("patient-123", "555-867-5309", "PHONE-001")

# Use it with FilterService
policy = Policy.from_dict({
    "name": "medical",
    "identifiers": {
        "emailAddress": {
            "emailAddressFilterStrategies": [{"strategy": "REDACT"}]
        },
        "phoneNumber": {
            "phoneNumberFilterStrategies": [{"strategy": "REDACT"}]
        }
    }
})

service = FilterService(context_service=ctx_svc)

# The pre-populated replacements will be used
result1 = service.filter(
    policy, "patient-123", "doc-1",
    "Contact john@example.com or 555-867-5309."
)
print(result1.filtered_text)
# Contact EMAIL-001 or PHONE-001.

# The same replacements persist across documents in the same context
result2 = service.filter(
    policy, "patient-123", "doc-2",
    "Patient called 555-867-5309 from john@example.com."
)
print(result2.filtered_text)
# Patient called PHONE-001 from EMAIL-001.

# Check what's stored
print(ctx_svc.get("patient-123", "john@example.com"))      # EMAIL-001
print(ctx_svc.contains("patient-123", "555-867-5309"))     # True
print(ctx_svc.get("patient-123", "unknown@example.com"))   # None

Methods

Method Signature Description
put (context, token, replacement) -> None Store a replacement for a token
get (context, token) -> str \| None Retrieve a replacement, or None if not found
contains (context, token) -> bool Check if a replacement exists

Policy key reference

The table below maps every JSON/YAML policy key to the Identifiers attribute it populates and the strategies key used in its filter config.

Policy key Identifiers attribute Strategies key
age age ageFilterStrategies
emailAddress email_address emailAddressFilterStrategies
creditCard credit_card creditCardFilterStrategies
ssn ssn ssnFilterStrategies
phoneNumber phone_number phoneNumberFilterStrategies
ipAddress ip_address ipAddressFilterStrategies
url url urlFilterStrategies
zipCode zip_code zipCodeFilterStrategies
vin vin vinFilterStrategies
bitcoinAddress bitcoin_address bitcoinAddressFilterStrategies
bankRoutingNumber bank_routing_number bankRoutingNumberFilterStrategies
date date dateFilterStrategies
macAddress mac_address macAddressFilterStrategies
currency currency currencyFilterStrategies
streetAddress street_address streetAddressFilterStrategies
trackingNumber tracking_number trackingNumberFilterStrategies
driversLicense drivers_license driversLicenseFilterStrategies
ibanCode iban_code ibanCodeFilterStrategies
passportNumber passport_number passportNumberFilterStrategies
phEye ph_eye (list) phEyeFilterStrategies