Observability Capabilities — Xferity Metrics, Logs, Health, and Audit
Observability Capabilities
Section titled “Observability Capabilities”This page is the explicit, structured reference for Xferity’s observability capabilities.
Prometheus metrics
Section titled “Prometheus metrics”Xferity exports Prometheus-format metrics at /metrics.
Access: authenticated admin (not anonymous scrape-compatible by default).
Metrics coverage
Section titled “Metrics coverage”| Category | Metrics included |
|---|---|
| Flow runs | run count, run duration, success/failure rates |
| Job queue | enqueued count, completed count, queue depth |
| Transfers | transfer bytes, file count, error count |
| Retries | retry count, dead-letter count |
| Certificates | expiry state, days-to-expiry per certificate |
| Auth | authentication failures, rate-limit denials |
| Notifications | delivery success/failure per channel |
| Secrets | resolution outcomes |
| AS2 | message send/receive counts, MDN outcomes |
| Audit sidecar | index build metrics |
Pre-built Prometheus alert rules
Section titled “Pre-built Prometheus alert rules”Xferity ships pre-built alert rules for:
- Certificate expiry approaching or expired
- Job queue building up (not draining)
- Flow failure rate exceeding threshold
- Worker health failures
Health endpoints
Section titled “Health endpoints”| Endpoint | Auth required | Purpose |
|---|---|---|
/health/worker | None | Worker readiness — safe for unauthenticated Kubernetes/load-balancer probes |
/health | Authenticated | General service health |
/health/secrets | Authenticated | Secret provider reachability |
/health/certificates | Authenticated | Certificate expiry status |
Health payloads check:
- state-store writability
- audit path writability
- worker claim latency and readiness
- secret provider availability
- certificate expiry conditions
Structured application logs
Section titled “Structured application logs”Every log line emits structured JSON with fields including:
level— debug, info, warn, errorflow— flow name when applicablerun_id— unique run identifiercorrelation_id— request or job correlation identifierpartner— partner name when applicablemsg— human-readable message
Log access:
- CLI:
xferity logs <flow>with level filter and tail mode - Log files on disk
- Compatible with log aggregation tools (Loki, Fluentd, Splunk, etc.)
Crypto observability fields
Section titled “Crypto observability fields”When OpenPGP operations run (via gopenpgp or GnuPG), Xferity emits dedicated structured fields:
| Field | What it tells you |
|---|---|
provider | Which provider handled the operation |
mode | gopenpgp, gnupg, or auto |
fallback_used | Whether fallback to GnuPG occurred |
fallback_reason | Why fallback was triggered |
fallback_subreason | Detailed reason for fallback |
cleanup_status | Whether GnuPG temp home cleanup succeeded |
Typical interpretation:
provider=gopenpgp+fallback_used=false→ native path handled it cleanlyfallback_used=true→ native provider failed withcompat_enterprise_key_structure; GnuPG succeededcleanup_status=failed→ operation may have succeeded but temp workspace cleanup failed — requires investigation
JSONL audit records
Section titled “JSONL audit records”Audit records are distinct from logs. They are structured lifecycle records:
- One JSON event per line per file operation
- Fields: timestamp, flow, run_id, correlation_id, event_type, file_name, outcome, error_code, and more
- Append-only file model
- SHA-256 hash chain across events (optional but recommended)
Querying audit records
Section titled “Querying audit records”- CLI:
xferity trace <filename>— all audit events for a given file - HTTP API:
GET /api/audit?file=<basename>— file lifecycle lookup - Sidecar index for fast file lookup without scanning full log
- Compatible with
jq,awk, or any JSONL tooling
Tamper-evidence chain fields
Section titled “Tamper-evidence chain fields”When tamper evidence is enabled:
chain_seq— event sequence numberprev_hash— hash of preceding eventevent_hash— SHA-256 hash of this event
Any modification, insertion, or deletion breaks the chain and is detectable with standard tooling.
Diagnostic tools
Section titled “Diagnostic tools”Pre-execution diagnostics:
xferity diag [flow]— checks endpoint reachability, key availability, certificate validity, filesystem accessxferity validate— strict YAML config validation with field-level error reporting
Post-execution investigation:
xferity trace <filename>— file lifecycle across all runsxferity flow history <flow>— per-run outcomes and retry countsxferity logs <flow>— tailed runtime logs with level filter
Capability summary
Section titled “Capability summary”Xferity observability capabilities include:
- Prometheus-format metrics at
/metrics(authenticated) - Coverage: flow runs, job queue depth, transfer bytes, retries, cert expiry, auth failures, notifications, secrets, AS2
- Pre-built Prometheus alert rules (cert expiry, queue depth, failure rate)
/health/worker— unauthenticated readiness probe/health,/health/secrets,/health/certificates— authenticated health checks- Structured JSON logs with
flow,run_id,correlation_idon every line - CLI log tailing with level filter (
xferity logs <flow>) - Crypto observability fields:
provider,fallback_used,fallback_reason,cleanup_status - JSONL audit records — one event per file operation
- SHA-256 hash-chain tamper evidence on audit events
- Sidecar index for fast file lifecycle lookup
xferity trace <filename>— full file lifecycle across runsxferity diag— endpoint and config pre-flight checks