De-identification Methods
There are several ways data can be de-identified, and which you use depends on the types of data you want to de-identify and your use-case for de-identifying the data. The terminology around the different methods is often used interchangeably, but there are differences between each method.
In this User's Guide, we may use the terms
filterandredactinterchangeably.
In Philter, de-identification methods vary for each type of sensitive information. For example, all types can be replaced or redacted, but only dates can be shifted and only zip codes can be truncated. How a de-identification method is applied by Philter is called a filter strategy. Each type of sensitive information can have one or more filter strategies, and the combination of the filter strategies you select is called a policy. A policy determines how a document will be de-identified.
The following is a list of de-identification methods that describes how each method works and its applicability to Philter. Deidentifying a document is likely to require a combination of the following methods. For instance, you may want to redact names, encrypt credit card numbers, and shift appointment dates.
Summary of Deidentification Methods
| De-identification Method | Description |
|---|---|
| Replacement | Replaces sensitive information with a defined value. For example, you might want to replace a credit card number with the literal value "CREDIT_CARD_NUMBER". |
| Redaction and Masking | Removes sensitive information. Philter gives you a choice of how to remove the sensitive information, whether it is by replacing it with ***** (masking) or by some other set of characters. |
| Encryption | Encrypts sensitive information. |
| Date Shifting | Shifts dates either forward or backward by some interval. |
| Bucketing | Categorizes data into buckets based on the data. Examples of bucketing is Philter can bucket dates into years, and zip codes by population. |
A difference between Philter and other services is that Philter does not send your data to a third party for de-identification. Philter runs in your cloud and your data stays in your cloud.
Deidentification Methods
Redaction and Masking
Redaction and masking are two methods of de-identification that are often used interchangeably. The term redaction refers to removing a sensitive value from a document. When we hear the term redaction we often think of an image of a document with black bars across pieces of the text.
Masking is similar to redaction but allows for configuring how the sensitive value is removed. The most common example is using asterisks (i.e. ******) in place of a sensitive value.
Replacement
Replacement is a method of de-identification that simply replaces a sensitive value with another value. Replacement is useful when the sensitive value is not needed once the document has been de-identified. Philter can replace a sensitive value with a preset value or with a random value.
In Philter's filter strategies, replacement is achieved by using the strategy to REDACT, STATIC_REPLACE , or RANDOM_REPLACE .
Bucketing
Bucketing is a method of de-identification that categorizes data into buckets based on the data. Bucketing is useful when the exact value is not needed, but the general category is important for analysis.
In Philter, bucketing can be applied to several types of data. For example:
- Zip Codes: Zip codes can be bucketed by their population. This allows for redacting zip codes with a population below a certain threshold.
- Dates: Dates can be bucketed into years or other time intervals.
Date Shifting
Date shifting is a method of de-identification that shifts dates either forward or backward by some interval. Date shifting is useful when the exact date is sensitive, but the relationship between dates in a document or set of documents is important to preserve.
In Philter, date shifting is achieved by using the SHIFT or SHIFTRANDOM filter strategies. These strategies allow you to specify the number of days, minutes, months, and years to shift the date.
| Option | Description |
|---|---|
shiftDays |
The number of days to shift the date. Can be a negative or positive integer. |
shiftMinutes |
The number of minutes to shift the date. Can be a negative or positive integer. |
shiftMonths |
The number of months to shift the date. Can be a negative or positive integer. |
shiftYears |
The number of years to shift the date. Can be a negative or positive integer. |
Encryption
Encryption is a method of de-identification that replaces a sensitive value with its encrypted version. Encryption is useful when the sensitive value may need to be recovered later by an authorized party.
Philter provides two primary methods of encryption:
- AES Encryption: Using the
CRYPTO_REPLACEfilter strategy, Philter encrypts the sensitive value using the AES encryption algorithm. This requires a key and an initialization vector (IV) to be defined in the policy. - Format Preserving Encryption (FPE): Using the
FPE_ENCRYPT_REPLACEfilter strategy, Philter encrypts the sensitive value while preserving its original format (e.g., an encrypted credit card number will still look like a credit card number).