Identifiers¶
Identifiers tell go-phileas which types of sensitive information to look for in the input text. Each identifier maps to a filter in the policy.Identifiers struct and has a corresponding filter strategy slice and optional configuration fields.
Age¶
Detects age expressions such as 45 years old, aged 30, 61 y/o.
Go
Age: &policy.AgeFilter{
AgeFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
},
},
JSON key: age
Example matches: 45 years old, aged 30, 61 y/o, a 22 year old
Bank Routing Number¶
Detects US ABA bank routing numbers (9-digit numbers starting with 0–3).
Go
BankRoutingNumber: &policy.BankRoutingNumberFilter{
BankRoutingNumberFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: bankRoutingNumber
Example matches: 021000021, 111000038
Bitcoin Address¶
Detects Bitcoin wallet addresses (Base58Check P2PKH and P2SH formats).
Go
BitcoinAddress: &policy.BitcoinAddressFilter{
BitcoinAddressFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyMask},
},
},
JSON key: bitcoinAddress
Example matches: 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa, 3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy
Credit Card¶
Detects credit card numbers for Visa, MasterCard, American Express, Diners Club, Discover, and JCB.
Additional option: OnlyValidCreditCardNumbers — when true, only numbers that pass the Luhn check are redacted. Numbers that match the credit card regex pattern but fail the Luhn check digit validation are left untouched. Defaults to false.
The Luhn algorithm is a simple checksum formula used to validate identification numbers such as credit card numbers. Enabling this option reduces false positives by ensuring only structurally valid card numbers are redacted.
Go
CreditCard: &policy.CreditCardFilter{
OnlyValidCreditCardNumbers: true,
CreditCardFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyLast4},
},
},
JSON key: creditCard
"creditCard": {
"onlyValidCreditCardNumbers": true,
"creditCardFilterStrategies": [{"strategy": "LAST_4"}]
}
Example matches: 4111111111111111 (Visa), 5500005555555559 (MasterCard), 378282246310005 (Amex)
| Number | Luhn valid | Redacted when onlyValidCreditCardNumbers: true |
|---|---|---|
4111111111111111 |
✅ | ✅ |
4111111111111112 |
❌ | ❌ |
Date¶
Detects dates in many formats: MM/DD/YYYY, YYYY-MM-DD, Month DD YYYY, DD Month YYYY, and more.
Additional option: OnlyValidDates — when true, only calendar-valid dates are matched.
Go
Date: &policy.DateFilter{
DateFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[DATE]"},
},
},
JSON key: date
Example matches: 01/15/2024, 2024-01-15, January 15 2024, 15 Jan 2024
Driver's License¶
Detects US driver's license numbers.
Go
DriversLicense: &policy.DriversLicenseFilter{
DriversLicenseFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: driversLicense
Email Address¶
Detects email addresses.
Additional option: OnlyValidTLDs — when true, only email addresses with recognised top-level domains are matched.
Go
EmailAddress: &policy.EmailAddressFilter{
EmailAddressFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[EMAIL REMOVED]"},
},
BaseFilter: policy.BaseFilter{
Ignored: []string{"noreply@example.com"},
},
},
JSON key: emailAddress
"emailAddress": {
"emailAddressFilterStrategies": [
{"strategy": "STATIC_REPLACE", "staticReplacement": "[EMAIL REMOVED]"}
],
"ignored": ["noreply@example.com"]
}
Example matches: john.doe@example.com, user+tag@domain.co.uk
IBAN Code¶
Detects International Bank Account Numbers.
Go
IbanCode: &policy.IbanCodeFilter{
IbanCodeFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyMask},
},
},
JSON key: ibanCode
Example matches: GB29NWBK60161331926819, DE89370400440532013000
IP Address¶
Detects both IPv4 and IPv6 addresses.
Go
IPAddress: &policy.IPAddressFilter{
IPAddressFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: ipAddress
Example matches: 192.168.1.1, 2001:0db8:85a3:0000:0000:8a2e:0370:7334
MAC Address¶
Detects MAC (hardware) addresses.
Go
MACAddress: &policy.MACAddressFilter{
MACAddressFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: macAddress
Example matches: 00:1A:2B:3C:4D:5E, 00-1A-2B-3C-4D-5E
Passport Number¶
Detects US passport numbers.
Go
PassportNumber: &policy.PassportNumberFilter{
PassportNumberFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: passportNumber
Phone Number¶
Detects US and international phone numbers in a variety of formats.
Go
PhoneNumber: &policy.PhoneNumberFilter{
PhoneNumberFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[PHONE]"},
},
},
JSON key: phoneNumber
"phoneNumber": {
"phoneNumberFilterStrategies": [
{"strategy": "STATIC_REPLACE", "staticReplacement": "[PHONE]"}
]
}
Example matches: 555-867-5309, (555) 867-5309, +1-555-867-5309, +44 20 7946 0958
Social Security Number (SSN)¶
Detects US Social Security Numbers and Taxpayer Identification Numbers.
Go
SSN: &policy.SSNFilter{
SSNFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
},
},
JSON key: ssn
Example matches: 123-45-6789, 123 45 6789
Tracking Number¶
Detects package tracking numbers from UPS, FedEx, and USPS.
Go
TrackingNumber: &policy.TrackingNumberFilter{
TrackingNumberFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: trackingNumber
Example matches: 1Z999AA10123456784 (UPS), 449044304137821 (FedEx), 9400111899223397846246 (USPS)
URL¶
Detects URLs.
Additional option: RequireHTTPWWWPrefix — when true, only matches URLs that start with http://, https://, or www..
Go
URL: &policy.URLFilter{
RequireHTTPWWWPrefix: true,
URLFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: url
Example matches: https://www.example.com, http://api.example.com/v1/users
Vehicle Identification Number (VIN)¶
Detects 17-character Vehicle Identification Numbers.
Go
VIN: &policy.VINFilter{
VINFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: vin
Example matches: 1HGBH41JXMN109186, JH4KA7650MC002844
ZIP Code¶
Detects US ZIP codes (5-digit and ZIP+4).
Additional option: RequireDelimiter — when true, ZIP+4 codes must include the dash delimiter (e.g. 12345-6789).
Go
ZipCode: &policy.ZipCodeFilter{
RequireDelimiter: false,
ZipCodeFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
JSON key: zipCode
Example matches: 90210, 12345-6789
Custom Dictionary¶
Detects and redacts words from a user-supplied list. Unlike the regex-based identifiers, dictionaries is a list — you can configure multiple dictionary filters in a single policy, each with its own word list and strategy.
Terms are matched at word boundaries and are case-insensitive by default. Terms can be provided inline in the policy or loaded from a file (one word per line).
Dictionary options¶
| Field | Type | Default | Description |
|---|---|---|---|
dictionaryFilterStrategies |
[]FilterStrategy |
REDACT |
How to handle identified words. |
terms |
[]string |
— | Inline list of terms to redact. |
files |
[]string |
— | List of file paths containing words to redact (one word per line). |
caseSensitive |
bool |
false |
When true, word matching is case-sensitive. |
fuzzy |
string |
— | Enables approximate (fuzzy) matching using Levenshtein distance. Valid values: low, medium, high. See Fuzzy matching below. |
ignored |
[]string |
— | Terms to skip, compared case-insensitively. |
enabled |
bool |
true |
Set to false to disable this filter instance without removing it from the policy. |
Go (inline word list)
Dictionaries: []policy.DictionaryFilter{
{
Terms: []string{"Alice", "Bob", "Acme Corp"},
DictionaryFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
},
},
},
JSON key: dictionaries
"dictionaries": [
{
"terms": ["Alice", "Bob", "Acme Corp"],
"dictionaryFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
}
]
YAML
dictionaries:
- terms:
- Alice
- Bob
- Acme Corp
dictionaryFilterStrategies:
- strategy: REDACT
redactionFormat: "{{{REDACTED-%t}}}"
Go (file-based word list)
Dictionaries: []policy.DictionaryFilter{
{
Files: []string{"/etc/phileas/sensitive-names.txt"},
DictionaryFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[NAME REMOVED]"},
},
},
},
"dictionaries": [
{
"files": ["/etc/phileas/sensitive-names.txt"],
"dictionaryFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[NAME REMOVED]"}]
}
]
Example matches: Any word from the configured list, matched at word boundaries (bob matches "bob" but not "bobby").
Note: When
NewFilterService(orNewFilterServiceWithContext) is called with a policy that contains file-based dictionary filters, it returns an error if any of the specified files cannot be read. Check the returned error before using the service.
Fuzzy matching¶
When fuzzy is set, the dictionary filter also matches tokens that are close to a dictionary word according to Levenshtein distance (the number of single-character insertions, deletions, or substitutions needed to transform one word into another). Exact matches still produce a confidence of 1.0; fuzzy matches produce a lower confidence to reflect the uncertainty.
| Level | Max Levenshtein distance | Confidence |
|---|---|---|
low |
1 | 0.8 |
medium |
2 | 0.6 |
high |
3 | 0.4 |
Example: with fuzzy: "low" and the word "secret" in the dictionary, the misspelling "secrat" (distance 1) is matched and redacted with confidence 0.8, while "secret" itself is matched at confidence 1.0.
Go
Dictionaries: []policy.DictionaryFilter{
{
Terms: []string{"secret"},
Fuzzy: policy.FuzzyLow,
DictionaryFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
},
JSON
"dictionaries": [
{
"words": ["secret"],
"fuzzy": "low",
"dictionaryFilterStrategies": [{"strategy": "REDACT"}]
}
]
YAML
Note: Fuzzy matching tokenizes the input by splitting on whitespace and punctuation. Very short words (1–2 characters) may produce false positives at
mediumorhighlevels; preferlowfuzziness for short dictionary terms.
Using multiple dictionaries¶
Because dictionaries is a list you can combine multiple independent word lists in one policy:
Dictionaries: []policy.DictionaryFilter{
{
Terms: []string{"Alice", "Bob"},
DictionaryFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact},
},
},
{
Files: []string{"/etc/phileas/project-names.txt"},
DictionaryFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[PROJECT]"},
},
},
},
"dictionaries": [
{
"terms": ["Alice", "Bob"],
"dictionaryFilterStrategies": [{"strategy": "REDACT"}]
},
{
"files": ["/etc/phileas/project-names.txt"],
"dictionaryFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[PROJECT]"}]
}
]
Person's Names via ph-eye (NER)¶
Detects person names (and other configurable entity labels) using the ph-eye natural language processing service. ph-eye is a standalone HTTP service that hosts AI/NLP models for named-entity recognition (NER).
Unlike the regex-based identifiers, pheye is a list — you can configure multiple ph-eye instances in a single policy (for example, to target different models or endpoints).
ph-eye configuration¶
The phEyeConfiguration object controls how go-phileas connects to a ph-eye service instance:
| Field | Type | Default | Description |
|---|---|---|---|
endpoint |
string |
http://localhost:18080 |
The URL of the ph-eye service. |
timeout |
int |
600 |
HTTP connection timeout in seconds. |
labels |
string |
Person |
Comma-separated list of entity labels to detect (e.g. "Person", "Person,Place"). |
Filter options¶
| Field | Type | Default | Description |
|---|---|---|---|
phEyeFilterStrategies |
[]FilterStrategy |
REDACT |
How to handle identified spans. |
removePunctuation |
bool |
false |
When true, punctuation is stripped from the text before it is sent to ph-eye. |
bearerToken |
string |
— | Optional bearer token sent in the Authorization header to authenticate with ph-eye. |
windowSize |
int |
— | Overrides the context window size for this filter. |
priority |
int |
0 |
Tie-breaking priority when two spans are otherwise identical. |
ignored |
[]string |
— | Terms to skip, compared case-insensitively. |
enabled |
bool |
true |
Set to false to disable this filter instance without removing it from the policy. |
Go
PhEye: []policy.PhEyeFilter{
{
PhEyeConfiguration: policy.PhEyeConfiguration{
Endpoint: "http://localhost:18080",
Labels: "Person",
},
PhEyeFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
},
},
},
JSON key: pheye
"pheye": [
{
"phEyeConfiguration": {
"endpoint": "http://localhost:18080",
"labels": "Person"
},
"phEyeFilterStrategies": [
{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}
]
}
]
YAML
pheye:
- phEyeConfiguration:
endpoint: http://localhost:18080
labels: Person
phEyeFilterStrategies:
- strategy: REDACT
redactionFormat: "{{{REDACTED-%t}}}"
Example matches: George Washington, Jane Smith
Using multiple ph-eye instances¶
Because pheye is a list you can point to more than one service at the same time:
PhEye: []policy.PhEyeFilter{
{
PhEyeConfiguration: policy.PhEyeConfiguration{
Endpoint: "http://pheye-en:18080",
Labels: "Person",
},
},
{
PhEyeConfiguration: policy.PhEyeConfiguration{
Endpoint: "http://pheye-fr:18080",
Labels: "Person",
},
},
},
Using multiple identifiers together¶
pol := &policy.Policy{
Name: "comprehensive",
Identifiers: policy.Identifiers{
SSN: &policy.SSNFilter{},
EmailAddress: &policy.EmailAddressFilter{},
PhoneNumber: &policy.PhoneNumberFilter{},
CreditCard: &policy.CreditCardFilter{
OnlyValidCreditCardNumbers: true,
CreditCardFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyLast4},
},
},
IPAddress: &policy.IPAddressFilter{},
Date: &policy.DateFilter{
DateFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[DATE]"},
},
},
},
}
svc, err := services.NewFilterService(pol)
if err != nil {
panic(err)
}
result, err := svc.Filter(pol, "ctx", inputText)