PhEye Filter - AI-Powered Named Entity Recognition
The PhEye filter provides AI-powered named entity recognition (NER) for detecting persons,
organizations, locations, and other entities in text. It runs in one of two modes: it connects to a remote
PhEye NLP service via HTTP, or, when ModelPath is set, it runs a local
GLiNER model entirely in-process with no network call (see Local GLiNER inference).
Features
- Named Entity Recognition: Detects persons, organizations, locations, and custom entity types
- Local or remote: Run a local GLiNER model in-process, or call a remote PhEye service
- Confidence Scoring: Provides confidence scores for each detection
- Configurable Thresholds: Filter entities based on confidence levels per label
- Bearer Token Authentication: Secure communication with remote PhEye services
Setup
Deploy a PhEye service or use an existing endpoint, then point the filter at it.
Configuration
using Phileas.Policy;
using Phileas.Policy.Filters;
using Phileas.Services;
using PhileasPolicy = Phileas.Policy.Policy;
var policy = new PhileasPolicy
{
Name = "pheye-policy",
Identifiers = new Identifiers
{
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://localhost:8080",
BearerToken = "your-api-token", // Optional
Timeout = 30, // Seconds
Labels = new List<string> { "PERSON", "ORG", "LOC" }
}
}
}
}
};
JSON Configuration
{
"identifiers": {
"pheyes": [
{
"phEyeConfiguration": {
"endpoint": "http://localhost:8080",
"bearerToken": "your-api-token",
"timeout": 30,
"labels": ["PERSON", "ORG", "LOC"]
},
"removePunctuation": false
}
]
}
}
Usage Example
var filterService = new FilterService();
var result = filterService.Filter(
policy: policy,
context: "default",
piece: 0,
input: "John Smith joined the meeting."
);
Console.WriteLine(result.FilteredText);
// Output: {{{REDACTED-person}}} joined the meeting.
Local GLiNER inference
Set ModelPath to run a local GLiNER model in-process instead of calling a
remote service. Detection happens entirely on-device, so no text leaves the host and no PhEye service is required.
This is the mode produced by the PhiSQL MODEL clause (policy schema 1.1.0).
The path must point at a GLiNER model directory containing:
- an exported ONNX graph:
model.onnxormodel_quantized.onnx(anonnx/subdirectory is also searched), - the SentencePiece tokenizer
spm.model, and gliner_config.json.
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
ModelPath = "/models/ph-eye-pii-base",
Labels = new List<string> { "person", "email address", "social security number" },
Threshold = 0.5
}
}
};
{
"identifiers": {
"pheyes": [
{
"phEyeConfiguration": {
"modelPath": "/models/ph-eye-pii-base",
"labels": ["person", "email address", "social security number"],
"threshold": 0.5
}
}
]
}
}
Notes:
- GLiNER is zero-shot, so
Labelsis a free-text prompt: any entity description works, not a fixed enum. Threshold(default0.5) is the minimum span confidence; raise it for higher precision, lower it for higher recall.- When
ModelPathis set,Endpoint,BearerToken, andTimeoutare ignored. - The model loads once on first use and is reused across calls. Call
Dispose()on the filter to release it.
Long inputs and the token limit
GLiNER has a fixed sub-token limit (max_len in gliner_config.json, 384 for ph-eye-pii-base). The local
path enforces it directly so long text is never silently truncated: before inference the words are split into
token-aware chunks that each stay within max_len, every chunk is run, and detections are mapped back to
absolute character offsets in the original document. This is independent of, and in addition to, any policy-level
Services/Split character chunking — even a single un-split piece is kept safe.
- Consecutive chunks overlap by
max_width - 1words (the widest span GLiNER can emit), so an entity that lands on a chunk boundary is still wholly contained in one chunk and detected; the duplicate a span picks up from the overlap is removed when results are merged. - Truncation is never silent. If the label prompt is so long it leaves no room for text, or a single "word" (an
unbroken run with no whitespace) encodes to more sub-tokens than one chunk can hold, the filter throws an
InvalidOperationExceptiondescribing the limit instead of quietly dropping tokens. Shorten the labels, or ensure the input has normal word breaks, to resolve it.
Configuration Options
PhEyeConfiguration Properties
| Property | Type | Default | Description |
|---|---|---|---|
Endpoint |
string |
"http://localhost:8080" |
Base URL of the PhEye service (remote mode) |
BearerToken |
string? |
null |
Bearer token for API authentication (remote mode) |
Timeout |
int |
30 |
Request timeout in seconds (remote mode) |
Labels |
List<string> |
["Person"] |
Entity labels to detect. In local mode this is the GLiNER prompt |
ModelPath |
string? |
null |
Path to a local GLiNER model directory. When set, the filter runs on-device and ignores Endpoint |
Threshold |
double |
0.5 |
Minimum span confidence for local inference (local mode only) |
PhEye Filter Properties
| Property | Type | Default | Description |
|---|---|---|---|
RemovePunctuation |
bool |
false |
Strip punctuation before processing |
Strategies |
List<PhEyeFilterStrategy> |
[REDACT] |
Replacement strategies for detected entities |
Ignored |
List<string> |
[] |
Terms to ignore during detection |
IgnoredPatterns |
List<IgnoredPattern> |
[] |
Regex patterns to ignore |
Priority |
int |
0 |
Filter priority for overlapping spans |
Supported Entity Types
Detected entities are mapped to a Phileas FilterType:
| Entity Label | FilterType | Description |
|---|---|---|
PERSON (case-insensitive) |
FilterType.Person |
Person names |
| Any other label | FilterType.Other |
All other entity types |
The original service label is preserved on each span's Classification.
Confidence Thresholds
Use a per-label minimum confidence (via a strategy condition) to filter out low-confidence predictions:
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://localhost:8080",
Labels = new List<string> { "PERSON" }
},
Strategies = new List<PhEyeFilterStrategy>
{
new PhEyeFilterStrategy
{
Strategy = "REDACT",
Condition = "confidence >= 0.90" // Minimum confidence
}
}
}
}
Multiple PhEye Configurations
You can configure multiple PhEye instances in a single policy, each pointing at a different endpoint:
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://pheye-persons:8080",
Labels = new List<string> { "PERSON" }
}
},
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://pheye-orgs:8080",
Labels = new List<string> { "ORGANIZATION" }
}
}
}
Filter Strategies
The PhEye filter supports all standard Phileas strategies:
using Phileas.Policy.Filters;
using Phileas.Policy.Filters.Strategies;
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://localhost:8080",
Labels = new List<string> { "PERSON" }
},
Strategies = new List<PhEyeFilterStrategy>
{
// Mask person names
new PhEyeFilterStrategy { Strategy = "MASK" },
// Or use static replacement
new PhEyeFilterStrategy
{
Strategy = "STATIC_REPLACE",
StaticReplacement = "[NAME REMOVED]"
}
}
}
}
See Filter Strategies for all available options.
Ignored Terms
Configure terms that should not be redacted:
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://localhost:8080",
Labels = new List<string> { "PERSON" }
},
Ignored = new List<string> { "John", "Microsoft", "MIT" }
}
}
Performance Considerations
- Network Latency: Processing time depends on network speed and service location.
- Scalability: The PhEye service can be scaled horizontally.
- Resource Usage: Minimal local resources are required.
- Throughput: Depends on service capacity and configuration.
Example Scenarios
Multi-Language Support
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://pheye-english:8080",
Labels = new List<string> { "PERSON", "ORG", "LOC" }
}
},
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://pheye-spanish:8080",
Labels = new List<string> { "PERSON", "ORG", "LOC" }
}
}
}
Troubleshooting
Connection Timeout
- Verify the endpoint URL is correct and accessible.
- Check network connectivity and firewall rules.
- Increase the Timeout value if the service is slow.
Authentication Errors
- Ensure the BearerToken is correct.
- Verify the token has not expired.
No Entities Detected
- Confirm the Labels list matches the service's output labels.
- Check the service logs for errors.
Resource Cleanup
The PhEye filter implements IDisposable for proper resource cleanup:
using var filter = new PhEyeFilter(config, phEyeConfig, false, thresholds);
// Use the filter...
// Automatically disposes the HTTP client and, in local mode, the loaded GLiNER model.
Integration with Phileas Pipeline
The PhEye filter integrates seamlessly with other Phileas filters:
var policy = new PhileasPolicy
{
Name = "comprehensive-pii",
Identifiers = new Identifiers
{
// AI-powered entity detection
PhEyes = new List<PhEye>
{
new PhEye
{
PhEyeConfiguration = new PhEyeConfiguration
{
Endpoint = "http://localhost:8080",
Labels = new List<string> { "PERSON", "ORG" }
}
}
},
// Pattern-based detectors
EmailAddress = new EmailAddress(),
PhoneNumber = new PhoneNumber(),
Ssn = new Ssn(),
CreditCard = new CreditCard()
}
};
Next Steps
- Read about Filter Strategies to customize redaction behavior.
- Learn about Filter Conditions for conditional redaction.
- Explore the API Reference for detailed method documentation.
- Check out the PhEye service documentation for service setup.
Questions?
Visit the Phileas documentation or the GitHub repository for more information.