Filters
A "filter" corresponds to a type of sensitive information. Phileas has filters for sensitive information such as names, addresses, ages, and lots of others.
These are predefined filters that are ready to be used as well as custom filters that let you define your own Phileas to identify sensitive information outside what the predefined filters can identify. An example of a custom filter is a filter to identify your patient account numbers, where the structure of an account number is specific to your organization.
Each filter is capable of identifying and redacting a specific type of sensitive information. For example, there is a filter for phone numbers and a filter for US social security numbers. You can enable any combination of these filters based on the types of sensitive information you need to redact.
This section of the documentation describes the filters available in Phileas. The configuration options for each filter can vary due to the type of the sensitive information. For instance, only the zip code filter has a configuration to truncate the zip code.
A selection of filters and their configurations is called a policy. A policy describes how to de-identify a document.
Predefined Filters
Person's Names
Phileas uses several methods to identify phEyeFilter's names.
Type | Description |
---|---|
First Names | Identifies common first names |
Surnames | Identifies common surnames |
Person's Names (NER) | Identifies full names using natural language processing analysis |
Physician's Names (NER) | Identifies physician names using natural language processing analysis |
Other Filters
Type | Description |
---|---|
Ages | Identifies ages such as 3.5 years old |
Bank Routing Numbers | Identifies bank routing numbers |
Bitcoin Addresses | Identifies Bitcoin addresses such as 127NVqnjf8gB9BFAW2dnQeM6wqmy1gbGtv |
Cities | Identifies common cities |
Counties | Identifies common counties |
Credit Card Numbers | Identifies VISA, American Express, MasterCard, and Discover credit card numbers |
Dates | Identifies dates in many formats such as May 22, 1999 |
Driver's License Numbers | Identifies driver's license numbers for all 50 US states |
Email Addresses | Identifies email addresses |
Hospitals | Identifies common hospital names |
Hospital Abreviations | Identifies common hospitals by their name abbreviations |
IBAN Codes | Identifies international bank account numbers |
IP Addresses | Identifies IPv4 and IPv6 addresses |
MAC Addresses | Identifies network MAC addresses |
Passport Numbers | Identifies US passport numbers |
Phone Numbers | Identifies phone numbers |
Phone Number Extensions | Identifies phone numbers |
Sections | Identifies sections in text denoted by |
SSNs and TINs | Identifies US SSNs and TINs |
States | Identifies US state names |
State Abbreviations | Identifies US state names by their abbreviations |
Tracking Numbers | Identifies UPS, FedEx, and USPS tracking numbers |
URLs | Identifies URLs |
VINs | Identifies vehicle identification numbers |
Zip Codes | Identifies US zip codes |
Custom Filter Types of Sensitive Information
In addition to the predefined types of sensitive information listed in the table above, you can also define your own
types of sensitive information. Through custom identifiers and dictionaries, Phileas can identify many other types of
information that may be sensitive in your use-case. For example, if you have patient identifiers that follow a pattern
of AA-00000
you can define a custom identifier for this sensitive information.
Phileas can be configured to look identify sensitive information based on custom dictionaries. When a term in the dictionary is found in the text, Phileas will treat the term as sensitive information and apply the given filter strategy.
Custom dictionaries support fuzziness to accommodate for misspellings. The replacement strategy for a custom dictionary
has a sensitivityLevel
that controls the amount of allowed fuzziness.
Type | Description |
---|---|
Custom Dictionaries | Identifies sensitive information based on dictionary values. |
Custom Identifiers | Identifies custom alphanumeric identifiers that may be used for medical record numbers, patient identifiers, account number, or other specific identifier. |
Filter Properties
Each filter has properties for customizing how the filter operates. While the properties vary between each filter, at a minimum each filter includes properties to enable or disable the filter, a list of terms to ignore, one or more filter strategies, and a priority.
The priority is an integer value that is used as a tie-breaker when Phileas finds identical PII that fits the criteria of more than one filter. For example, you may want to filter SSNs and 9-digit numbers as patient identifiers. If both filters identify the same text, the filter with the highest priority will determine how the text is labeled.