Dictionary

Filter

This filter identifies custom text based on a given dictionary.

Required Parameters

At least one of terms or files must be provided.

Parameter	Description	Default Value
`terms`	A list of terms in the dictionary.	None
`files`	A list of files containing terms one per line.	None

Optional Parameters

Parameter	Description	Default Value
`enabled`	When set to false, the filter will be disabled and not applied	`true`
`ignored`	A list of terms to be ignored by the filter.	None
`fuzzy`	When set to true, the dictionary will employ fuzzy comparisons. Use the `sensitivity` parameter to control the level of fuzziness. Setting this value to false will disable fuzziness and provide a higher level of performance.	`false`
`sensitivity`	Controls the "fuzziness" of allowed values to account for misspellings and derivations. Valid values are `off` meaning only exact matches, `low`, `medium`, and `high`. Only applies when `fuzzy` is set to `true`.	`medium`
`classification`	Used to apply an arbitrary label to the identifier, such as "patient-id", or "account-number."	`"custom-identifier"`
`windowSize`	Sets the size of the window (in terms) surrounding a span to look for contextual terms. If set, this value overrides the value of `span.window.size` in the configuration.	The value of `span.window.size` which is by default `5`.
`priority`	The priority (integer) of this filter. Valid values are any positive integer, where a higher value indicates a higher priority. Priority is used for tie-breaking when two spans may be otherwise identical.	`0`

Filter Strategies

The filter may have zero or more filter strategies. When no filter strategy is given the default strategy of REDACT is used. When multiple filter strategies are given the filter strategies will be applied in as they are listed. See Filter Strategies for details.

Strategy	Description
`REDACT`	Replace the sensitive text with a placeholder.
`RANDOM_REPLACE`	Replace the sensitive text with a similar, random value.
`STATIC_REPLACE`	Replace the sensitive text with a given value.
`CRYPTO_REPLACE`	Replace the sensitive text with its encrypted value.
`HASH_SHA256_REPLACE`	Replace the sensitive text with its SHA256 hash value.

Conditions

Each filter strategy may have one condition. See Conditions for details.

Conditional	Description	Operators
`TOKEN`	Compares the value of the sensitive text.	`==` , `!=`
`CONTEXT`	Compares the filtering context.	`==` , `!=`
`CONFIDENCE`	Compares the confidence in the sensitive text against a threshold value.	`<` , `<=`, `>` , `>=`, `==`, `!=`

Example Policy

{
   "name": "dictionary-example",
   "identifiers": {
      "dictionaries": [
         "customDictionary": {
            "terms": ["john", "jane", "doe"],
            "files": "c:\temp\dictionary.txt",
            "fuzzy": true,
            "sensitivity": "medium",
            "sectionFilterStrategies": [
               {
                  "strategy": "REDACT",
                  "redactionFormat": "{{{REDACTED-%t}}}"
               }
            ]
         }
      ]
   }   
}