Skip to content

Identifiers

Identifiers tell go-phileas which types of sensitive information to look for in the input text. Each identifier maps to a filter in the policy.Identifiers struct and has a corresponding filter strategy slice and optional configuration fields.


Age

Detects age expressions such as 45 years old, aged 30, 61 y/o.

Go

Age: &policy.AgeFilter{
    AgeFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
    },
},

JSON key: age

"age": {
  "ageFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
}

Example matches: 45 years old, aged 30, 61 y/o, a 22 year old


Bank Routing Number

Detects US ABA bank routing numbers (9-digit numbers starting with 0–3).

Go

BankRoutingNumber: &policy.BankRoutingNumberFilter{
    BankRoutingNumberFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: bankRoutingNumber

"bankRoutingNumber": {}

Example matches: 021000021, 111000038


Bitcoin Address

Detects Bitcoin wallet addresses (Base58Check P2PKH and P2SH formats).

Go

BitcoinAddress: &policy.BitcoinAddressFilter{
    BitcoinAddressFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyMask},
    },
},

JSON key: bitcoinAddress

"bitcoinAddress": {}

Example matches: 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa, 3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy


Credit Card

Detects credit card numbers for Visa, MasterCard, American Express, Diners Club, Discover, and JCB.

Additional option: OnlyValidCreditCardNumbers — when true, only numbers that pass the Luhn check are redacted. Numbers that match the credit card regex pattern but fail the Luhn check digit validation are left untouched. Defaults to false.

The Luhn algorithm is a simple checksum formula used to validate identification numbers such as credit card numbers. Enabling this option reduces false positives by ensuring only structurally valid card numbers are redacted.

Go

CreditCard: &policy.CreditCardFilter{
    OnlyValidCreditCardNumbers: true,
    CreditCardFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyLast4},
    },
},

JSON key: creditCard

"creditCard": {
  "onlyValidCreditCardNumbers": true,
  "creditCardFilterStrategies": [{"strategy": "LAST_4"}]
}

Example matches: 4111111111111111 (Visa), 5500005555555559 (MasterCard), 378282246310005 (Amex)

Number Luhn valid Redacted when onlyValidCreditCardNumbers: true
4111111111111111
4111111111111112

Date

Detects dates in many formats: MM/DD/YYYY, YYYY-MM-DD, Month DD YYYY, DD Month YYYY, and more.

Additional option: OnlyValidDates — when true, only calendar-valid dates are matched.

Go

Date: &policy.DateFilter{
    DateFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyStaticReplace, StaticReplacement: "[DATE]"},
    },
},

JSON key: date

"date": {
  "dateFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[DATE]"}]
}

Example matches: 01/15/2024, 2024-01-15, January 15 2024, 15 Jan 2024


Driver's License

Detects US driver's license numbers.

Go

DriversLicense: &policy.DriversLicenseFilter{
    DriversLicenseFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: driversLicense

"driversLicense": {}

Email Address

Detects email addresses.

Additional option: OnlyValidTLDs — when true, only email addresses with recognised top-level domains are matched.

Go

EmailAddress: &policy.EmailAddressFilter{
    EmailAddressFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyStaticReplace, StaticReplacement: "[EMAIL REMOVED]"},
    },
    BaseFilter: policy.BaseFilter{
        Ignored: []string{"noreply@example.com"},
    },
},

JSON key: emailAddress

"emailAddress": {
  "emailAddressFilterStrategies": [
    {"strategy": "STATIC_REPLACE", "staticReplacement": "[EMAIL REMOVED]"}
  ],
  "ignored": ["noreply@example.com"]
}

Example matches: john.doe@example.com, user+tag@domain.co.uk


IBAN Code

Detects International Bank Account Numbers.

Go

IbanCode: &policy.IbanCodeFilter{
    IbanCodeFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyMask},
    },
},

JSON key: ibanCode

"ibanCode": {}

Example matches: GB29NWBK60161331926819, DE89370400440532013000


IP Address

Detects both IPv4 and IPv6 addresses.

Go

IPAddress: &policy.IPAddressFilter{
    IPAddressFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: ipAddress

"ipAddress": {}

Example matches: 192.168.1.1, 2001:0db8:85a3:0000:0000:8a2e:0370:7334


MAC Address

Detects MAC (hardware) addresses.

Go

MACAddress: &policy.MACAddressFilter{
    MACAddressFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: macAddress

"macAddress": {}

Example matches: 00:1A:2B:3C:4D:5E, 00-1A-2B-3C-4D-5E


Passport Number

Detects US passport numbers.

Go

PassportNumber: &policy.PassportNumberFilter{
    PassportNumberFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: passportNumber

"passportNumber": {}

Phone Number

Detects US and international phone numbers in a variety of formats.

Go

PhoneNumber: &policy.PhoneNumberFilter{
    PhoneNumberFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyStaticReplace, StaticReplacement: "[PHONE]"},
    },
},

JSON key: phoneNumber

"phoneNumber": {
  "phoneNumberFilterStrategies": [
    {"strategy": "STATIC_REPLACE", "staticReplacement": "[PHONE]"}
  ]
}

Example matches: 555-867-5309, (555) 867-5309, +1-555-867-5309, +44 20 7946 0958


Social Security Number (SSN)

Detects US Social Security Numbers and Taxpayer Identification Numbers.

Go

SSN: &policy.SSNFilter{
    SSNFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
    },
},

JSON key: ssn

"ssn": {
  "ssnFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
}

Example matches: 123-45-6789, 123 45 6789


Tracking Number

Detects package tracking numbers from UPS, FedEx, and USPS.

Go

TrackingNumber: &policy.TrackingNumberFilter{
    TrackingNumberFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: trackingNumber

"trackingNumber": {}

Example matches: 1Z999AA10123456784 (UPS), 449044304137821 (FedEx), 9400111899223397846246 (USPS)


URL

Detects URLs.

Additional option: RequireHTTPWWWPrefix — when true, only matches URLs that start with http://, https://, or www..

Go

URL: &policy.URLFilter{
    RequireHTTPWWWPrefix: true,
    URLFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: url

"url": {
  "requireHttpWwwPrefix": true
}

Example matches: https://www.example.com, http://api.example.com/v1/users


Vehicle Identification Number (VIN)

Detects 17-character Vehicle Identification Numbers.

Go

VIN: &policy.VINFilter{
    VINFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: vin

"vin": {}

Example matches: 1HGBH41JXMN109186, JH4KA7650MC002844


ZIP Code

Detects US ZIP codes (5-digit and ZIP+4).

Additional option: RequireDelimiter — when true, ZIP+4 codes must include the dash delimiter (e.g. 12345-6789).

Go

ZipCode: &policy.ZipCodeFilter{
    RequireDelimiter: false,
    ZipCodeFilterStrategies: []policy.FilterStrategy{
        {Strategy: policy.StrategyRedact},
    },
},

JSON key: zipCode

"zipCode": {
  "requireDelimiter": false
}

Example matches: 90210, 12345-6789


Custom Dictionary

Detects and redacts words from a user-supplied list. Unlike the regex-based identifiers, dictionaries is a list — you can configure multiple dictionary filters in a single policy, each with its own word list and strategy.

Terms are matched at word boundaries and are case-insensitive by default. Terms can be provided inline in the policy or loaded from a file (one word per line).

Dictionary options

Field Type Default Description
dictionaryFilterStrategies []FilterStrategy REDACT How to handle identified words.
terms []string Inline list of terms to redact.
files []string List of file paths containing words to redact (one word per line).
caseSensitive bool false When true, word matching is case-sensitive.
fuzzy string Enables approximate (fuzzy) matching using Levenshtein distance. Valid values: low, medium, high. See Fuzzy matching below.
ignored []string Terms to skip, compared case-insensitively.
enabled bool true Set to false to disable this filter instance without removing it from the policy.

Go (inline word list)

Dictionaries: []policy.DictionaryFilter{
    {
        Terms: []string{"Alice", "Bob", "Acme Corp"},
        DictionaryFilterStrategies: []policy.FilterStrategy{
            {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
        },
    },
},

JSON key: dictionaries

"dictionaries": [
  {
    "terms": ["Alice", "Bob", "Acme Corp"],
    "dictionaryFilterStrategies": [{"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}]
  }
]

YAML

dictionaries:
  - terms:
      - Alice
      - Bob
      - Acme Corp
    dictionaryFilterStrategies:
      - strategy: REDACT
        redactionFormat: "{{{REDACTED-%t}}}"

Go (file-based word list)

Dictionaries: []policy.DictionaryFilter{
    {
        Files: []string{"/etc/phileas/sensitive-names.txt"},
        DictionaryFilterStrategies: []policy.FilterStrategy{
            {Strategy: policy.StrategyStaticReplace, StaticReplacement: "[NAME REMOVED]"},
        },
    },
},
"dictionaries": [
  {
    "files": ["/etc/phileas/sensitive-names.txt"],
    "dictionaryFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[NAME REMOVED]"}]
  }
]

Example matches: Any word from the configured list, matched at word boundaries (bob matches "bob" but not "bobby").

Note: When NewFilterService (or NewFilterServiceWithContext) is called with a policy that contains file-based dictionary filters, it returns an error if any of the specified files cannot be read. Check the returned error before using the service.

Fuzzy matching

When fuzzy is set, the dictionary filter also matches tokens that are close to a dictionary word according to Levenshtein distance (the number of single-character insertions, deletions, or substitutions needed to transform one word into another). Exact matches still produce a confidence of 1.0; fuzzy matches produce a lower confidence to reflect the uncertainty.

Level Max Levenshtein distance Confidence
low 1 0.8
medium 2 0.6
high 3 0.4

Example: with fuzzy: "low" and the word "secret" in the dictionary, the misspelling "secrat" (distance 1) is matched and redacted with confidence 0.8, while "secret" itself is matched at confidence 1.0.

Go

Dictionaries: []policy.DictionaryFilter{
    {
        Terms: []string{"secret"},
        Fuzzy: policy.FuzzyLow,
        DictionaryFilterStrategies: []policy.FilterStrategy{
            {Strategy: policy.StrategyRedact},
        },
    },
},

JSON

"dictionaries": [
  {
    "words": ["secret"],
    "fuzzy": "low",
    "dictionaryFilterStrategies": [{"strategy": "REDACT"}]
  }
]

YAML

dictionaries:
  - words:
      - secret
    fuzzy: low
    dictionaryFilterStrategies:
      - strategy: REDACT

Note: Fuzzy matching tokenizes the input by splitting on whitespace and punctuation. Very short words (1–2 characters) may produce false positives at medium or high levels; prefer low fuzziness for short dictionary terms.

Using multiple dictionaries

Because dictionaries is a list you can combine multiple independent word lists in one policy:

Dictionaries: []policy.DictionaryFilter{
    {
        Terms: []string{"Alice", "Bob"},
        DictionaryFilterStrategies: []policy.FilterStrategy{
            {Strategy: policy.StrategyRedact},
        },
    },
    {
        Files: []string{"/etc/phileas/project-names.txt"},
        DictionaryFilterStrategies: []policy.FilterStrategy{
            {Strategy: policy.StrategyStaticReplace, StaticReplacement: "[PROJECT]"},
        },
    },
},
"dictionaries": [
  {
    "terms": ["Alice", "Bob"],
    "dictionaryFilterStrategies": [{"strategy": "REDACT"}]
  },
  {
    "files": ["/etc/phileas/project-names.txt"],
    "dictionaryFilterStrategies": [{"strategy": "STATIC_REPLACE", "staticReplacement": "[PROJECT]"}]
  }
]

Person's Names via ph-eye (NER)

Detects person names (and other configurable entity labels) using the ph-eye natural language processing service. ph-eye is a standalone HTTP service that hosts AI/NLP models for named-entity recognition (NER).

Unlike the regex-based identifiers, pheye is a list — you can configure multiple ph-eye instances in a single policy (for example, to target different models or endpoints).

ph-eye configuration

The phEyeConfiguration object controls how go-phileas connects to a ph-eye service instance:

Field Type Default Description
endpoint string http://localhost:18080 The URL of the ph-eye service.
timeout int 600 HTTP connection timeout in seconds.
labels string Person Comma-separated list of entity labels to detect (e.g. "Person", "Person,Place").

Filter options

Field Type Default Description
phEyeFilterStrategies []FilterStrategy REDACT How to handle identified spans.
removePunctuation bool false When true, punctuation is stripped from the text before it is sent to ph-eye.
bearerToken string Optional bearer token sent in the Authorization header to authenticate with ph-eye.
windowSize int Overrides the context window size for this filter.
priority int 0 Tie-breaking priority when two spans are otherwise identical.
ignored []string Terms to skip, compared case-insensitively.
enabled bool true Set to false to disable this filter instance without removing it from the policy.

Go

PhEye: []policy.PhEyeFilter{
    {
        PhEyeConfiguration: policy.PhEyeConfiguration{
            Endpoint: "http://localhost:18080",
            Labels:   "Person",
        },
        PhEyeFilterStrategies: []policy.FilterStrategy{
            {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
        },
    },
},

JSON key: pheye

"pheye": [
  {
    "phEyeConfiguration": {
      "endpoint": "http://localhost:18080",
      "labels": "Person"
    },
    "phEyeFilterStrategies": [
      {"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}
    ]
  }
]

YAML

pheye:
  - phEyeConfiguration:
      endpoint: http://localhost:18080
      labels: Person
    phEyeFilterStrategies:
      - strategy: REDACT
        redactionFormat: "{{{REDACTED-%t}}}"

Example matches: George Washington, Jane Smith

Using multiple ph-eye instances

Because pheye is a list you can point to more than one service at the same time:

PhEye: []policy.PhEyeFilter{
    {
        PhEyeConfiguration: policy.PhEyeConfiguration{
            Endpoint: "http://pheye-en:18080",
            Labels:   "Person",
        },
    },
    {
        PhEyeConfiguration: policy.PhEyeConfiguration{
            Endpoint: "http://pheye-fr:18080",
            Labels:   "Person",
        },
    },
},

Using multiple identifiers together

pol := &policy.Policy{
    Name: "comprehensive",
    Identifiers: policy.Identifiers{
        SSN:          &policy.SSNFilter{},
        EmailAddress: &policy.EmailAddressFilter{},
        PhoneNumber:  &policy.PhoneNumberFilter{},
        CreditCard: &policy.CreditCardFilter{
            OnlyValidCreditCardNumbers: true,
            CreditCardFilterStrategies: []policy.FilterStrategy{
                {Strategy: policy.StrategyLast4},
            },
        },
        IPAddress: &policy.IPAddressFilter{},
        Date: &policy.DateFilter{
            DateFilterStrategies: []policy.FilterStrategy{
                {Strategy: policy.StrategyStaticReplace, StaticReplacement: "[DATE]"},
            },
        },
    },
}

svc, err := services.NewFilterService(pol)
if err != nil {
    panic(err)
}
result, err := svc.Filter(pol, "ctx", inputText)