Guardrails

Guardrails overview

Guardrails run in the gateway request path before traffic reaches a model.

Guardrails are how Outgate protects sensitive values while traffic moves through the gateway.

When a provider has guardrails enabled, the gateway inspects POST request bodies before forwarding them upstream. The request is sent to the regional guardrail service, which looks for values such as PII and credentials and returns a decision to the gateway.

The gateway then either allows the request, blocks it, or rewrites sensitive values into Outgate placeholders before the upstream model sees them. The client still calls the same endpoint, but the request path now includes a protection step.

How validation runs

The gateway reads the request body and asks the guardrail service to validate it.

Guardrail validation happens in the gateway before the upstream request is made. The gateway reads the request body, attaches provider, organization, policy, method, path, content type, user agent, and client IP context, then calls the regional guardrail service at /validate.

If the guardrail service is unavailable or returns an unexpected error, the request fails open. Outgate records the error for observability, but it does not block production traffic solely because the validation service could not respond.

For normal provider traffic, validation only runs for POST requests with a body. For dry-run scans, the gateway returns the detection result directly instead of forwarding the request to the upstream model.

Detection and vault matching

Outgate combines fresh analysis with fingerprint matches for known values.

The guardrail service extracts text-bearing content from the request and evaluates it with the configured guardrail model. At the same time, it checks the Detection Vault, a Redis-backed fingerprint store for values Outgate has seen before.

The vault stores hashes and tokenized fingerprints, not plaintext values in production mode. When a known value appears again, the service can match it without asking the model to rediscover it.

New detections are written back to the vault asynchronously after a successful validation. This makes future requests faster and helps Outgate recognize recurring PII or credentials across prompts, files, tool outputs, and agent sessions.

CLI scan mode

og scan uses the same guardrail path in dry-run mode. It detects values and stores fingerprints without sending the scanned content to the upstream model.

Policy behavior

Policies decide what happens when PII or credentials are detected.

A guardrail policy controls how Outgate handles PII and credentials. For each of these categories, the policy can allow the request, block it, anonymize the matched value, and redact the raw match from guardrail records.

CategoryTypical defaultWhat gets protected
PIIAnonymize without blocking.Names, email addresses, phone numbers, and similar personal data.
CredentialsAnonymize without blocking.API keys, passwords, bearer tokens, secret values, and similar credentials.

Blocking stops the request before it reaches the upstream provider. Anonymization keeps the request flowing but replaces the sensitive value with an Outgate placeholder first.

Redaction controls what appears in guardrail records. When redaction is enabled, Outgate keeps enough context to explain the decision without storing the full raw match in logs or policy results.

Anonymization and response rehydration

Sensitive values are replaced before forwarding and restored on the way back.

When the guardrail service returns an anonymization map, the gateway rewrites the upstream-bound request body. Each detected value is replaced with a stable placeholder such as OG_PII_... or OG_CREDENTIAL_....

The gateway also adds a short instruction to the request so the model treats these placeholders as values that should be preserved exactly. This helps agents and model providers pass placeholders through tool calls and responses without trying to reinterpret them.

On the response path, the gateway uses the stored anonymization map to replace placeholders with the original values before returning the response to the caller. The upstream provider sees placeholders; the caller gets the expected response shape back.

Streaming and compression

Response rehydration may buffer or transform response chunks so placeholders can be replaced reliably. The gateway clears request compression hints when anonymization is active to keep response rewriting predictable.

Observing guardrails

Guardrail decisions are written into gateway logs and metrics.

The gateway records guardrail latency, decision, severity, block status, anonymization count, vault hits, and cache statistics. The log manager collects these fields into metrics so teams can see how guardrails affect traffic over time.

Blocked requests are logged as alerts. Non-blocking detections are logged as detection events, which helps teams understand what was protected without interrupting the application.

Bringing it together

Guardrails protect the request path without changing client code.

Once guardrails are enabled on a provider, your application continues to call the same endpoint. The gateway validates the request, applies policy, replaces sensitive values when needed, forwards the safe version upstream, and restores placeholders in the response.

That gives teams a central place to protect PII and credentials while preserving the developer workflow and the provider APIs they already use.