Filter Strategies¶
A filter strategy controls what happens to a piece of sensitive information once it has been identified. Each identifier in a policy accepts one or more strategies; the first strategy in the list is applied.
Strategy constants¶
| Constant | JSON value | Description |
|---|---|---|
policy.StrategyRedact |
"REDACT" |
Replace the text with a redaction placeholder. |
policy.StrategyStaticReplace |
"STATIC_REPLACE" |
Replace the text with a fixed static value. |
policy.StrategyMask |
"MASK" |
Replace each character with a mask character (default *). |
policy.StrategyLast4 |
"LAST_4" |
Keep only the last four characters. |
policy.StrategyHashSHA256Replace |
"HASH_SHA256_REPLACE" |
Replace the text with its SHA-256 hash. |
policy.StrategyCryptoReplace |
"CRYPTO_REPLACE" |
Encrypt the text using AES. Requires crypto in the policy. |
policy.StrategyRandomReplace |
"RANDOM_REPLACE" |
Replace the text with a random but realistic value. |
policy.StrategyShiftDate |
"SHIFT_DATE" |
Shift an identified date by a configurable number of days, months, and/or years. Only applies to the date identifier; other identifier types fall back to REDACT. |
REDACT¶
Replace the sensitive text with a configurable placeholder. Two template variables are supported in redactionFormat:
| Variable | Replaced with |
|---|---|
%t |
The filter type name (e.g. ssn, email-address) |
%v |
The original sensitive value |
The default format is {{{REDACTED-%t}}}, so %t is substituted with the filter type (e.g. ssn, email-address).
Example output: My SSN is {{{REDACTED-ssn}}}.
JSON:
To include the original value in the placeholder:
Example output: My SSN is {{{REDACTED-ssn(123-45-6789)}}}.
To use a fixed label instead of the type name, omit %t:
STATIC_REPLACE¶
Replace the sensitive text with a fixed string.
policy.FilterStrategy{
Strategy: policy.StrategyStaticReplace,
StaticReplacement: "[EMAIL REMOVED]",
}
Example output: Contact [EMAIL REMOVED] for help.
JSON:
MASK¶
Replace every character of the sensitive text with a mask character. The default mask character is *.
Example output: SSN 123-45-6789 becomes ***-**-**** (length preserved).
JSON:
To use a different character, set maskCharacter:
RANDOM_REPLACE¶
Replace the sensitive text with a fake but realistic value for the same PII type. The replacement is deterministically derived from the combination of the context name and the original value, so the same sensitive value appearing multiple times within the same context always gets the same replacement (referential integrity).
Example output: SSN 123-45-6789 might become 541-07-3812 (a different, realistic SSN). The replacement for 123-45-6789 in context "patient-records" will always be the same fake SSN for that context.
JSON:
Supported PII types and their replacement format¶
| Filter type | Replacement format |
|---|---|
ssn |
Valid SSN format (XXX-XX-XXXX), valid area number |
email-address |
<word><n>@example.com (or test.org / sample.net) |
phone-number |
(555) XXX-XXXX |
credit-card |
Luhn-valid 16-digit Visa-style number |
ip-address |
192.168.X.X (private range) |
date |
MM/DD/YYYY (1950–1999) |
zip-code |
5-digit number |
mac-address |
Locally-administered MAC (02:XX:XX:XX:XX:XX) |
vin |
17-char alphanumeric (ISO 3779 character set) |
bank-routing-number |
9-digit number |
bitcoin-address |
P2PKH-style address starting with 1 |
age |
N years old (1–80) |
url |
https://example.com/<path><n> |
iban-code |
GB IBAN format |
passport-number |
US-style: letter + 8 digits |
drivers-license-number |
Letter + 7 digits |
tracking-number |
UPS-style 1Z prefix |
| All others | Redaction placeholder {{{REDACTED-%t}}} |
LAST_4¶
Mask all characters except the last four.
Example output: Credit card 4111111111111111 becomes *************1111.
JSON:
HASH_SHA256_REPLACE¶
Replace the sensitive text with its lowercase hexadecimal SHA-256 hash. Useful for pseudonymisation where you need to rejoin records later.
Example output: 123-45-6789 becomes its 64-character SHA-256 hex digest.
JSON:
CRYPTO_REPLACE¶
Encrypt the sensitive text using AES. The policy must include a crypto block with a key and IV.
pol := &policy.Policy{
Name: "encrypted",
Identifiers: policy.Identifiers{
SSN: &policy.SSNFilter{
SSNFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyCryptoReplace},
},
},
},
Crypto: &policy.Crypto{
Key: "0123456789abcdef0123456789abcdef", // 32-byte key for AES-256
IV: "abcdef9876543210", // 16-byte IV
},
}
JSON:
{
"identifiers": {
"ssn": {
"ssnFilterStrategies": [{"strategy": "CRYPTO_REPLACE"}]
}
},
"crypto": {
"key": "0123456789abcdef0123456789abcdef",
"iv": "abcdef9876543210"
}
}
SHIFT_DATE¶
Shift an identified date forward or backward by a configurable number of days, months, and/or years. This strategy only applies to the date identifier; if used with any other identifier type it falls back to REDACT.
All three shift fields are optional and default to 0. Negative values shift the date backwards.
| Field | JSON key | Description |
|---|---|---|
ShiftDays |
shiftDays |
Number of days to add (negative to subtract). |
ShiftMonths |
shiftMonths |
Number of months to add (negative to subtract). |
ShiftYears |
shiftYears |
Number of years to add (negative to subtract). |
The output date is formatted in the same style as the matched input date:
| Input format | Example input | Example output (shift +1 year) |
|---|---|---|
ISO 8601 (YYYY-MM-DD) |
2020-03-15 |
2021-03-15 |
US numeric slash (MM/DD/YYYY) |
03/15/2020 |
03/15/2021 |
US numeric dash (MM-DD-YYYY) |
03-15-2020 |
03-15-2021 |
US numeric dot (MM.DD.YYYY) |
03.15.2020 |
03.15.2021 |
| Long month name | March 15, 2020 |
March 15, 2021 |
| Abbreviated month name | Mar 15, 2020 |
Mar 15, 2021 |
| Day-first long month | 15 March 2020 |
15 March 2021 |
| Day-first abbreviated | 15 Mar 2020 |
15 Mar 2021 |
Note on numeric formats with dashes: Dates separated by
-are parsed as MM-DD-YYYY (US-style, month first). ISO 8601 dates (YYYY-MM-DD) are detected first and handled separately. DD-MM-YYYY ordering is not supported for the dash separator.
Month names are matched case-insensitively; the output always uses title-case month names.
pol := &policy.Policy{
Name: "date-shift",
Identifiers: policy.Identifiers{
Date: &policy.DateFilter{
DateFilterStrategies: []policy.FilterStrategy{
{
Strategy: policy.StrategyShiftDate,
ShiftDays: 0,
ShiftMonths: 0,
ShiftYears: 1,
},
},
},
},
}
Example output: Patient born on 1990-05-15 becomes Patient born on 1991-05-15.
JSON:
{
"identifiers": {
"date": {
"dateFilterStrategies": [
{"strategy": "SHIFT_DATE", "shiftDays": 0, "shiftMonths": 0, "shiftYears": 1}
]
}
}
}
To shift backwards, use negative values:
Specifying strategies in a policy¶
Each identifier type has its own strategies slice. Only the first strategy is applied; the slice exists to allow future fallback support.
pol := &policy.Policy{
Name: "multi-strategy",
Identifiers: policy.Identifiers{
SSN: &policy.SSNFilter{
SSNFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
},
},
EmailAddress: &policy.EmailAddressFilter{
EmailAddressFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyStaticReplace, StaticReplacement: "[EMAIL]"},
},
},
CreditCard: &policy.CreditCardFilter{
CreditCardFilterStrategies: []policy.FilterStrategy{
{Strategy: policy.StrategyLast4},
},
},
},
}
Default behaviour¶
When no strategy is specified for an identifier, go-phileas uses REDACT with the default format {{{REDACTED-%t}}}. The following two configurations are equivalent: