Skip to content

Phileas

Phileas is a Java library to deidentify text and redact PII, PHI, and other sensitive information from text. Given text or documents (PDF), Phileas analyzes the text searching for sensitive information such as persons' names, ages, addresses, and many other types of information. Phileas is highly configurable through its settings and policies.

When sensitive information is identified, Phileas can manipulate the sensitive information in a variety of ways. The information can be replaced, encrypted, anonymized, and more. The user chooses how to manipulate each type of sensitive information. We refer to each of these methods in whole as "redaction."

Information can be redacted based on the content of the information and other attributes. For example, only certain persons' names, only zip codes meeting some qualification, or IP addresses that match a given pattern.

Using Phileas

Phileas snapshots and releases are available in our Maven repositories so add the following to your Maven configuration:

<repositories>
    <repository>
        <id>philterd-repository-releases</id>
        <url>https://artifacts.philterd.ai/releases</url>
        <snapshots>
            <enabled>false</enabled>
        </snapshots>
    </repository>
    <repository>
        <id>philterd-repository-snapshots</id>
        <url>https://artifacts.philterd.ai/snapshots</url>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>

Next, add the Phileas dependency to your project:

<dependency>
  <groupId>ai.philterd</groupId>
  <artifactId>phileas-core</artifactId>
  <version>2.7.1</version>
</dependency>

Quick Start

Create a FilterService, using a PhileasConfiguration, and call filter() on the service:

Properties properties = new Properties();
PhileasConfiguration phileasConfiguration = new PhileasConfiguration(properties);

FilterService filterService = new PhileasFilterService(phileasConfiguration);

FilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.TEXT_PLAIN);

The policies is a list of Policy classes. (See below for more about Policies.) The context and documentId are arbitrary values you can use to uniquely identify the text being filtered. The body is the text you are filtering. Lastly, we specify that the data is plain text.

The response contains information about the identified sensitive information along with the filtered text.

Usage Examples

The PhileasFilterServiceTest and EndToEndTests test classes have examples of how to configure Phileas and filter text.

Finding and Redacting Sensitive Information in a PDF Document

Create a FilterService, using a PhileasConfiguration, and call filter() on the service:

PhileasConfiguration phileasConfiguration = ConfigFactory.create(PhileasConfiguration.class);

FilterService filterService = new PhileasFilterService(phileasConfiguration);

BinaryDocumentFilterResponse response = filterService.filter(policies, context, documentId, body, MimeType.APPLICATION_PDF, MimeType.IMAGE_JPEG);

The policies is a list of Policy classes which are created by deserializing a policy from JSON. (See below for more about Policies.) The context and documentId are arbitrary values you can use to uniquely identify the text being filtered. The body is the text you are filtering. Lastly, we specify that the data is plain text.

The response contains a zip file of the images generated by redacting the PDF document.