Observability Capabilities — Xferity Metrics, Logs, Health, and Audit

Observability Capabilities

This page is the explicit, structured reference for Xferity’s observability capabilities.

Prometheus metrics

Xferity exports Prometheus-format metrics at /metrics.

Access: authenticated admin (not anonymous scrape-compatible by default).

Metrics coverage

Category	Metrics included
Flow runs	run count, run duration, success/failure rates
Job queue	enqueued count, completed count, queue depth
Transfers	transfer bytes, file count, error count
Retries	retry count, dead-letter count
Certificates	expiry state, days-to-expiry per certificate
Auth	authentication failures, rate-limit denials
Notifications	delivery success/failure per channel
Secrets	resolution outcomes
AS2	message send/receive counts, MDN outcomes
Audit sidecar	index build metrics

Pre-built Prometheus alert rules

Xferity ships pre-built alert rules for:

Certificate expiry approaching or expired
Job queue building up (not draining)
Flow failure rate exceeding threshold
Worker health failures

Health endpoints

Endpoint	Auth required	Purpose
`/health/worker`	None	Worker readiness — safe for unauthenticated Kubernetes/load-balancer probes
`/health`	Authenticated	General service health
`/health/secrets`	Authenticated	Secret provider reachability
`/health/certificates`	Authenticated	Certificate expiry status

Health payloads check:

state-store writability
audit path writability
worker claim latency and readiness
secret provider availability
certificate expiry conditions

Structured application logs

Every log line emits structured JSON with fields including:

level — debug, info, warn, error
flow — flow name when applicable
run_id — unique run identifier
correlation_id — request or job correlation identifier
partner — partner name when applicable
msg — human-readable message

Log access:

CLI: xferity logs <flow> with level filter and tail mode
Log files on disk
Compatible with log aggregation tools (Loki, Fluentd, Splunk, etc.)

Crypto observability fields

When OpenPGP operations run (via gopenpgp or GnuPG), Xferity emits dedicated structured fields:

Field	What it tells you
`provider`	Which provider handled the operation
`mode`	`gopenpgp`, `gnupg`, or `auto`
`fallback_used`	Whether fallback to GnuPG occurred
`fallback_reason`	Why fallback was triggered
`fallback_subreason`	Detailed reason for fallback
`cleanup_status`	Whether GnuPG temp home cleanup succeeded

Typical interpretation:

provider=gopenpgp + fallback_used=false → native path handled it cleanly
fallback_used=true → native provider failed with compat_enterprise_key_structure; GnuPG succeeded
cleanup_status=failed → operation may have succeeded but temp workspace cleanup failed — requires investigation

JSONL audit records

Audit records are distinct from logs. They are structured lifecycle records:

One JSON event per line per file operation
Fields: timestamp, flow, run_id, correlation_id, event_type, file_name, outcome, error_code, and more
Append-only file model
SHA-256 hash chain across events (optional but recommended)

Querying audit records

CLI: xferity trace <filename> — all audit events for a given file
HTTP API: GET /api/audit?file=<basename> — file lifecycle lookup
Sidecar index for fast file lookup without scanning full log
Compatible with jq, awk, or any JSONL tooling

Tamper-evidence chain fields

When tamper evidence is enabled:

chain_seq — event sequence number
prev_hash — hash of preceding event
event_hash — SHA-256 hash of this event

Any modification, insertion, or deletion breaks the chain and is detectable with standard tooling.

Diagnostic tools

Pre-execution diagnostics:

xferity diag [flow] — checks endpoint reachability, key availability, certificate validity, filesystem access
xferity validate — strict YAML config validation with field-level error reporting

Post-execution investigation:

xferity trace <filename> — file lifecycle across all runs
xferity flow history <flow> — per-run outcomes and retry counts
xferity logs <flow> — tailed runtime logs with level filter

Capability summary

Xferity observability capabilities include:

Prometheus-format metrics at /metrics (authenticated)
Coverage: flow runs, job queue depth, transfer bytes, retries, cert expiry, auth failures, notifications, secrets, AS2
Pre-built Prometheus alert rules (cert expiry, queue depth, failure rate)
/health/worker — unauthenticated readiness probe
/health, /health/secrets, /health/certificates — authenticated health checks
Structured JSON logs with flow, run_id, correlation_id on every line
CLI log tailing with level filter (xferity logs <flow>)
Crypto observability fields: provider, fallback_used, fallback_reason, cleanup_status
JSONL audit records — one event per file operation
SHA-256 hash-chain tamper evidence on audit events
Sidecar index for fast file lifecycle lookup
xferity trace <filename> — full file lifecycle across runs
xferity diag — endpoint and config pre-flight checks