Supported Identifiers
phileas-dotnet ships with a comprehensive set of built-in PII identifier types — pattern-based detectors, dictionary-backed name/location detectors, configurable custom dictionaries, custom regex identifiers, section detectors, and an AI-powered PhEye filter. Each type is enabled by setting the corresponding property on the Identifiers object inside a Policy.
Quick Reference
Pattern-based identifiers
| Property Name | JSON Key | Description |
|---|---|---|
Age |
age |
Numeric age expressions (e.g. "42 years old") |
BankRoutingNumber |
bankRoutingNumber |
US ABA bank routing numbers |
BitcoinAddress |
bitcoinAddress |
Bitcoin wallet addresses |
CreditCard |
creditCard |
Credit and debit card numbers |
Currency |
currency |
Currency amounts (e.g. "$1,234.56") |
Date |
date |
Calendar dates in common formats |
DriversLicense |
driversLicense |
US driver's license numbers |
EmailAddress |
emailAddress |
Email addresses |
IbanCode |
ibanCode |
International Bank Account Numbers |
IpAddress |
ipAddress |
IPv4 and IPv6 addresses |
MacAddress |
macAddress |
Network MAC addresses |
PassportNumber |
passportNumber |
Passport numbers |
PhoneNumber |
phoneNumber |
US and international phone numbers |
PhoneNumberExtension |
phoneNumberExtension |
Phone number extensions (e.g. "ext. 123") |
Ssn |
ssn |
US Social Security Numbers |
StateAbbreviation |
stateAbbreviation |
Two-letter US state codes |
StreetAddress |
streetAddress |
US street addresses |
TrackingNumber |
trackingNumber |
Shipping/parcel tracking numbers |
Url |
url |
HTTP/HTTPS URLs |
Vin |
vin |
Vehicle Identification Numbers |
ZipCode |
zipCode |
US ZIP codes (5-digit and ZIP+4) |
Dictionary-backed name & location identifiers
| Property Name | JSON Key | Description |
|---|---|---|
FirstName |
firstName |
Common first names |
Surname |
surname |
Common surnames |
City |
city |
City names |
County |
county |
County names |
State |
state |
US state names |
Hospital |
hospital |
Hospital names |
Custom & AI identifiers
| Property Name | JSON Key | Description |
|---|---|---|
Dictionaries |
dictionary |
Named lists of custom terms (legacy dictionary model, level-based fuzzy matching) |
CustomDictionaries |
dictionaries |
Custom term lists with classification and sensitivity-based fuzzy matching |
CustomIdentifiers |
identifiers |
Custom regex identifiers |
Sections |
sections |
Spans of text delimited by a start and end pattern |
PhEyes |
pheye |
AI-powered NER via a remote PhEye service |
Common Configuration
Every identifier type inherits from AbstractPolicyFilter:
public abstract class AbstractPolicyFilter
{
public bool Enabled { get; set; } = true;
public List<string>? Ignored { get; set; }
public List<string>? IgnoredFiles { get; set; }
public List<IgnoredPattern>? IgnoredPatterns { get; set; }
public int WindowSize { get; set; } // 0 = use the policy/global default
public int Priority { get; set; }
}
| Property | JSON key | Default | Description |
|---|---|---|---|
Enabled |
enabled |
true |
Whether the filter is active. |
Ignored |
ignored |
null |
Exact values that should not be redacted. |
IgnoredFiles |
ignoredFiles |
null |
Files whose lines provide additional ignored terms. |
IgnoredPatterns |
ignoredPatterns |
null |
Regex patterns whose matches are not redacted. |
WindowSize |
windowSize |
0 |
Context words on each side of a match; 0 uses the default (5). |
Priority |
priority |
0 |
Higher-priority filter spans win when spans overlap. |
In addition, each identifier exposes a Strategies list that lets you override the default REDACT behaviour. See Filter Strategies for all available strategies.
Identifier Details
Age
Detects age expressions such as "42 years old" or "aged 35".
Identifiers = new Identifiers { Age = new Age() }
"identifiers": { "age": {} }
Bank Routing Number
Detects 9-digit ABA routing numbers.
Identifiers = new Identifiers { BankRoutingNumber = new BankRoutingNumber() }
Bitcoin Address
Detects legacy (P2PKH/P2SH) and SegWit Bitcoin wallet addresses.
Identifiers = new Identifiers { BitcoinAddress = new BitcoinAddress() }
Credit Card
Detects credit and debit card numbers including Visa, Mastercard, Amex, Discover, and others.
Identifiers = new Identifiers { CreditCard = new CreditCard() }
Currency
Detects currency amounts with a symbol or ISO code prefix (e.g. $1,234.56, €99.00).
Identifiers = new Identifiers { Currency = new Currency() }
Date
Detects dates in common written and numeric forms (e.g. January 1, 2024, 01/01/2024, 6.4.2020).
Identifiers = new Identifiers { Date = new Date() }
| Property | JSON key | Default | Description |
|---|---|---|---|
OnlyValidDates |
onlyValidDates |
false |
When true, numeric dates that are not real calendar dates (e.g. 02-31-2019) are not redacted. Month-name dates are always treated as valid. |
Identifiers = new Identifiers
{
Date = new Date { OnlyValidDates = true }
}
The date strategies list uses the JSON key
dateFilterStrategies. TheSHIFT_DATEstrategy (see Filter Strategies) is specific to the Date filter.
Dictionary
Detects user-supplied terms in the input text. A policy can contain any number of dictionaries, each with its own name and list of terms. Matching is case-insensitive and whole-word.
Identifiers = new Identifiers
{
Dictionaries = new List<Dictionary>
{
new Dictionary
{
Name = "medical-conditions",
Terms = new List<string> { "diabetes", "hypertension", "asthma" }
}
}
}
Multiple dictionaries can be combined in a single policy:
Identifiers = new Identifiers
{
Dictionaries = new List<Dictionary>
{
new Dictionary
{
Name = "conditions",
Terms = new List<string> { "diabetes", "hypertension" }
},
new Dictionary
{
Name = "medications",
Terms = new List<string> { "metformin", "lisinopril" }
}
}
}
"identifiers": {
"dictionaries": [
{
"name": "conditions",
"terms": ["diabetes", "hypertension"]
},
{
"name": "medications",
"terms": ["metformin", "lisinopril"]
}
]
}
Fuzzy Matching
The dictionary filter supports fuzzy matching to detect misspelled or near-match terms using Levenshtein distance. Enable fuzzy matching by setting fuzzy: true and optionally specifying a level:
new Dictionary
{
Name = "medical-conditions",
Terms = new List<string> { "diabetes", "hypertension" },
Fuzzy = true,
Level = "medium" // "low", "medium", or "high"
}
{
"name": "medical-conditions",
"terms": ["diabetes", "hypertension"],
"fuzzy": true,
"level": "medium"
}
Fuzzy matching levels: a lower level allows more edits (it is more permissive), and the assigned match confidence drops accordingly.
| Level | Max Edit Distance | Confidence |
|---|---|---|
high |
0 (exact match) | 0.9 |
medium |
1 | 0.7 |
low (default) |
2 | 0.5 |
For example, with level: "medium", the term "diabetes" would match a misspelling like "diabetis" (1 edit) but not "diabtes" (2 edits). With level: "low", both would match.
Configuration Options
Each Dictionary entry supports the common AbstractPolicyFilter options (ignored, ignoredPatterns, priority) and an optional dictionaryFilterStrategies list to override the default REDACT behaviour, plus:
| Property | Type | Default | Description |
|---|---|---|---|
fuzzy |
bool |
false |
Enable fuzzy matching for near-match detection |
level |
string |
"low" |
Fuzzy matching sensitivity: "low", "medium", or "high" |
There are two dictionary models. The
Dictionarieslist above (JSON keydictionary) uses thelevel-based fuzzy matching shown here. A secondCustomDictionarieslist (JSON keydictionaries) carries aclassification, optional termfiles, and uses thesensitivityscale ("off","low","medium","high","auto") instead oflevel; its strategies list key iscustomFilterStrategies.
Driver's License
Detects US state driver's license number formats.
Identifiers = new Identifiers { DriversLicense = new DriversLicense() }
Email Address
Detects RFC-compliant email addresses.
Identifiers = new Identifiers { EmailAddress = new EmailAddress() }
// Whitelist a specific address
Identifiers = new Identifiers
{
EmailAddress = new EmailAddress
{
Ignored = new List<string> { "no-reply@example.com" }
}
}
IBAN Code
Detects International Bank Account Numbers in standard format (e.g. GB29 NWBK 6016 1331 9268 19).
Identifiers = new Identifiers { IbanCode = new IbanCode() }
IP Address
Detects IPv4 addresses (e.g. 192.168.1.1) and IPv6 addresses.
Identifiers = new Identifiers { IpAddress = new IpAddress() }
MAC Address
Detects network hardware MAC addresses in XX:XX:XX:XX:XX:XX or XX-XX-XX-XX-XX-XX format.
Identifiers = new Identifiers { MacAddress = new MacAddress() }
Passport Number
Detects US passport numbers.
Identifiers = new Identifiers { PassportNumber = new PassportNumber() }
PhEye
Detects named entities using AI-powered NLP by connecting to a remote PhEye NER service.
Identifiers = new Identifiers
{
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://localhost:8080",
BearerToken = "your-api-token", // Optional
Timeout = 30,
Labels = new List<string> { "PERSON", "ORG", "LOC" }
}
}
}
}
JSON configuration:
"identifiers": {
"pheyes": [
{
"phEyeConfiguration": {
"endpoint": "http://localhost:8080",
"bearerToken": "your-api-token",
"timeout": 30,
"labels": ["PERSON", "ORG", "LOC"]
}
}
]
}
Configuration Options:
| Property | Type | Default | Description |
|---|---|---|---|
endpoint |
string |
"http://localhost:8080" |
Base URL of the PhEye service |
bearerToken |
string? |
null |
Bearer token for authentication |
timeout |
int |
30 |
Request timeout in seconds |
labels |
string[] |
["Person"] |
Entity labels to detect |
Detected Entity Types:
- PERSON / PER → Mapped to FilterType.Person
- LOCATION / LOC → Mapped to FilterType.LocationCity
- ORGANIZATION / ORG → Mapped to FilterType.Other
- MISC → Mapped to FilterType.Other
For detailed documentation, see PhEye Filter Usage.
Phone Number
Detects US and international phone numbers in a variety of formats.
Identifiers = new Identifiers { PhoneNumber = new PhoneNumber() }
Phone Number Extension
Detects phone number extensions (e.g. ext. 1234, x1234).
Identifiers = new Identifiers { PhoneNumberExtension = new PhoneNumberExtension() }
SSN
Detects US Social Security Numbers in NNN-NN-NNNN format. The regex excludes invalid ranges (000, 666, 900–999 area codes; 00 group; 0000 serial).
Identifiers = new Identifiers { Ssn = new Ssn() }
The JSON key for the filter strategies list is ssnFilterStrategies:
"ssn": {
"ssnFilterStrategies": [
{ "strategy": "MASK" }
]
}
State Abbreviation
Detects two-letter US state abbreviations (e.g. CA, NY, TX).
Identifiers = new Identifiers { StateAbbreviation = new StateAbbreviation() }
Street Address
Detects US street addresses (e.g. 123 Main St, 456 Oak Ave Apt 7).
Identifiers = new Identifiers { StreetAddress = new StreetAddress() }
Tracking Number
Detects parcel tracking numbers from major carriers (UPS, FedEx, USPS, DHL).
Identifiers = new Identifiers { TrackingNumber = new TrackingNumber() }
URL
Detects HTTP and HTTPS URLs.
Identifiers = new Identifiers { Url = new Url() }
VIN
Detects 17-character Vehicle Identification Numbers.
Identifiers = new Identifiers { Vin = new Vin() }
ZIP Code
Detects 5-digit US ZIP codes and ZIP+4 codes (e.g. 12345, 12345-6789).
Identifiers = new Identifiers { ZipCode = new ZipCode() }
| Property | JSON key | Default | Description |
|---|---|---|---|
RequireDelimiter |
requireDelimiter |
false |
When true, the +4 extension must be dash-separated (12345-6789); an undelimited 9-digit run is not treated as a ZIP+4. |
Validate |
validate |
false |
When true, ZIP codes not present in the bundled census data are not redacted. |
Identifiers = new Identifiers
{
ZipCode = new ZipCode { Validate = true, RequireDelimiter = true }
}
The ZIP code strategies list uses the (singular) JSON key
zipCodeFilterStrategy. Thepopulationfilter condition also uses the bundled census data.
Names and Locations (dictionary-backed)
FirstName, Surname, City, County, State, and Hospital detect entries from bundled reference dictionaries. Each supports fuzzy matching tuned by a sensitivity level and a capitalized flag.
Identifiers = new Identifiers
{
FirstName = new FirstName(),
Surname = new Surname(),
City = new City { Fuzzy = true, Sensitivity = "medium" },
State = new State(),
Hospital = new Hospital()
}
| Property | JSON key | Default | Description |
|---|---|---|---|
Fuzzy |
fuzzy |
false |
Enable fuzzy (near-match) detection. |
Sensitivity |
sensitivity |
"medium" |
Fuzzy sensitivity: "off", "low", "medium", "high", or "auto". |
Capitalized |
capitalized |
false |
Only match terms that are capitalized in the input. |
The strategies list keys are firstNameFilterStrategies, surnameFilterStrategies, cityFilterStrategies, countyFilterStrategies, stateFilterStrategies, and hospitalFilterStrategies respectively.
Custom Regex Identifiers
CustomIdentifiers lets you define your own regex-based identifiers. Each match is classified with the configured classification.
Identifiers = new Identifiers
{
CustomIdentifiers = new List<Identifier>
{
new Identifier
{
Classification = "employee-id",
Pattern = @"\bEMP-\d{6}\b",
CaseSensitive = true,
GroupNumber = 0
}
}
}
| Property | JSON key | Default | Description |
|---|---|---|---|
Pattern |
pattern |
\b[A-Z0-9_-]{6,}\b |
The regular expression to match. |
Classification |
classification |
"custom-identifier" |
Label applied to matches. |
GroupNumber |
groupNumber |
0 |
Capture group used as the matched text (0 = whole match). |
CaseSensitive |
caseSensitive |
true |
Whether the regex is case-sensitive. |
The strategies list key is identifierFilterStrategies. The JSON key for the list itself on Identifiers is identifiers.
Sections
Sections redact everything between a start pattern and an end pattern (inclusive of the markers).
Identifiers = new Identifiers
{
Sections = new List<Section>
{
new Section { StartPattern = "BEGIN PRIVATE", EndPattern = "END PRIVATE" }
}
}
| Property | JSON key | Default | Description |
|---|---|---|---|
StartPattern |
startPattern |
null |
Regex marking the start of the section. |
EndPattern |
endPattern |
null |
Regex marking the end of the section. |
The strategies list key is sectionFilterStrategies. The JSON key for the list itself on Identifiers is sections.
Enabling Multiple Identifiers
Any combination of identifiers can be enabled in a single policy:
var policy = new Policy
{
Name = "comprehensive",
Identifiers = new Identifiers
{
Ssn = new Ssn(),
CreditCard = new CreditCard(),
EmailAddress = new EmailAddress(),
PhoneNumber = new PhoneNumber(),
IpAddress = new IpAddress(),
Url = new Url(),
Date = new Date()
}
};
When spans from different identifier types overlap, the longest span wins; ties are broken by higher confidence, then higher priority, then earlier position. Set a Priority on an identifier to influence the outcome.