Phileas
Phileas is a Java library to deidentify text and redact PII, PHI, and other sensitive information from text. Given text or documents (PDF), Phileas analyzes the text searching for sensitive information such as persons' names, ages, addresses, and many other types of information. Phileas is highly configurable through its settings and policies.
When sensitive information is identified, Phileas can manipulate the sensitive information in a variety of ways. The information can be replaced, encrypted, anonymized, and more. The user chooses how to manipulate each type of sensitive information. We refer to each of these methods in whole as "redaction."
Information can be redacted based on the content of the information and other attributes. For example, only certain persons' names, only zip codes meeting some qualification, or IP addresses that match a given pattern.
Using Phileas
Phileas snapshots and releases are available in our Maven repositories so add the following to your Maven configuration:
<repositories>
<repository>
<id>philterd-repository-releases</id>
<url>https://artifacts.philterd.ai/releases</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
<repository>
<id>philterd-repository-snapshots</id>
<url>https://artifacts.philterd.ai/snapshots</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
Next, add the Phileas dependency to your project:
<dependency>
<groupId>ai.philterd</groupId>
<artifactId>phileas-core</artifactId>
<version>2.7.1</version>
</dependency>
Quick Start
Create a FilterService
, using a PhileasConfiguration
, and call filter()
on the service:
Properties properties = new Properties();
PhileasConfiguration phileasConfiguration = new PhileasConfiguration(properties);
FilterService filterService = new PhileasFilterService(phileasConfiguration);
FilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.TEXT_PLAIN);
The policies
is a list of Policy
classes. (See below for more about Policies.) The context
and documentId
are arbitrary values you can use to uniquely identify the text being filtered. The body
is the text you are filtering. Lastly, we specify that the data is plain text.
The response
contains information about the identified sensitive information along with the filtered text.
Usage Examples
The PhileasFilterServiceTest and EndToEndTests test classes have examples of how to configure Phileas and filter text.
Finding and Redacting Sensitive Information in a PDF Document
Create a FilterService
, using a PhileasConfiguration
, and call filter()
on the service:
PhileasConfiguration phileasConfiguration = ConfigFactory.create(PhileasConfiguration.class);
FilterService filterService = new PhileasFilterService(phileasConfiguration);
BinaryDocumentFilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.APPLICATION_PDF, MimeType.IMAGE_JPEG);
The policies
is a list of Policy
classes which are created by deserializing a policy from JSON. (See below for more about Policies.) The context
and documentId
are arbitrary values you can use to uniquely identify the text being filtered. The body
is the text you are filtering. Lastly, we specify that the data is plain text.
The response
contains a zip file of the images generated by redacting the PDF document.