Policies¶

A policy is the central configuration object in go-phileas. It declares:

Which types of sensitive information to look for (the identifiers)
How to handle each type when found (the filter strategies)
Any terms or patterns that should be ignored globally or per filter

Policies can be defined in Go code as structs or loaded from JSON or YAML.

Defining a policy in Go¶

import (
    "github.com/philterd/go-phileas/pkg/policy"
    "github.com/philterd/go-phileas/pkg/services"
)

pol := &policy.Policy{
    Identifiers: policy.Identifiers{
        SSN: &policy.SSNFilter{
            SSNFilterStrategies: []policy.FilterStrategy{
                {Strategy: policy.StrategyRedact, RedactionFormat: "{{{REDACTED-%t}}}"},
            },
        },
        EmailAddress: &policy.EmailAddressFilter{},
        PhoneNumber:  &policy.PhoneNumberFilter{},
    },
}

svc, err := services.NewFilterService(pol)
if err != nil {
    panic(err)
}
result, err := svc.Filter(pol, "context-name", "Call 555-867-5309 or email pat@example.com.")

Only identifiers that are explicitly set (non-nil) are active. An identifier with an empty struct uses default settings (redact strategy with the default format {{{REDACTED-%t}}}).

Defining a policy in JSON¶

{
  "identifiers": {
    "ssn": {
      "ssnFilterStrategies": [
        {"strategy": "REDACT", "redactionFormat": "{{{REDACTED-%t}}}"}
      ]
    },
    "emailAddress": {},
    "phoneNumber": {}
  }
}

Load and use the JSON policy:

import "github.com/philterd/go-phileas/pkg/services"

result, err := services.FilterJSON(policyJSON, "context-name", inputText)

You can also unmarshal the JSON into a policy.Policy struct yourself, which allows you to inspect or modify the policy before using it:

import (
    "encoding/json"
    "github.com/philterd/go-phileas/pkg/policy"
    "github.com/philterd/go-phileas/pkg/services"
)

var pol policy.Policy
if err := json.Unmarshal([]byte(policyJSON), &pol); err != nil {
    // handle error
}

svc, err := services.NewFilterService(&pol)
if err != nil {
    // handle error
}
result, err := svc.Filter(&pol, "context-name", inputText)

Defining a policy in YAML¶

identifiers:
  ssn:
    ssnFilterStrategies:
      - strategy: REDACT
        redactionFormat: "{{{REDACTED-%t}}}"
  emailAddress: {}
  phoneNumber: {}

Load and use the YAML policy:

import "github.com/philterd/go-phileas/pkg/services"

result, err := services.FilterYAML(policyYAML, "context-name", inputText)

Policy fields reference¶

Field	Type	Description
`identifiers`	`Identifiers`	Declares which sensitive information types to detect.
`ignored`	`[]Ignored`	Global list of terms to ignore across all filters.
`ignoredPatterns`	`[]IgnoredPattern`	Global list of regex patterns to ignore across all filters.
`crypto`	`*Crypto`	Encryption key and IV for the `CRYPTO_REPLACE` strategy.

Enabling and disabling individual filters¶

Every identifier filter embeds BaseFilter, which has an Enabled field. By default, a filter is enabled whenever it is present in the policy. You can explicitly disable it:

disabled := false

pol := &policy.Policy{
    Identifiers: policy.Identifiers{
        SSN: &policy.SSNFilter{
            BaseFilter: policy.BaseFilter{Enabled: &disabled},
        },
        EmailAddress: &policy.EmailAddressFilter{}, // enabled
    },
}

In JSON:

{
  "identifiers": {
    "ssn": {"enabled": false},
    "emailAddress": {}
  }
}

In YAML:

identifiers:
  ssn:
    enabled: false
  emailAddress: {}

Ignoring specific terms¶

You can tell a filter to skip certain known-safe values. Ignored terms are compared case-insensitively.

Per-filter ignore list¶

pol := &policy.Policy{
    Identifiers: policy.Identifiers{
        EmailAddress: &policy.EmailAddressFilter{
            BaseFilter: policy.BaseFilter{
                Ignored: []string{"noreply@example.com", "admin@example.com"},
            },
        },
    },
}

{
  "identifiers": {
    "emailAddress": {
      "ignored": ["noreply@example.com", "admin@example.com"]
    }
  }
}

identifiers:
  emailAddress:
    ignored:
      - noreply@example.com
      - admin@example.com

Per-filter ignore patterns¶

Use ignoredPatterns to ignore text matching a regular expression:

pol := &policy.Policy{
    Identifiers: policy.Identifiers{
        EmailAddress: &policy.EmailAddressFilter{
            BaseFilter: policy.BaseFilter{
                IgnoredPatterns: []policy.IgnoredPattern{
                    {Name: "internal-emails", Pattern: `.*@mycompany\.com`},
                },
            },
        },
    },
}

{
  "identifiers": {
    "emailAddress": {
      "ignoredPatterns": [
        {"name": "internal-emails", "pattern": ".*@mycompany\\.com"}
      ]
    }
  }
}

identifiers:
  emailAddress:
    ignoredPatterns:
      - name: internal-emails
        pattern: ".*@mycompany\\.com"

Reusing a FilterService¶

NewFilterService compiles the filters once. Create a single instance and reuse it across calls for best performance:

svc, err := services.NewFilterService(pol)
if err != nil {
    panic(err)
}

for _, doc := range documents {
    result, err := svc.Filter(pol, doc.Context, doc.Text)
    // ...
}