Skip to content

Observability Capabilities — Xferity Metrics, Logs, Health, and Audit

This page is the explicit, structured reference for Xferity’s observability capabilities.

Xferity exports Prometheus-format metrics at /metrics.

Access: authenticated admin (not anonymous scrape-compatible by default).

CategoryMetrics included
Flow runsrun count, run duration, success/failure rates
Job queueenqueued count, completed count, queue depth
Transferstransfer bytes, file count, error count
Retriesretry count, dead-letter count
Certificatesexpiry state, days-to-expiry per certificate
Authauthentication failures, rate-limit denials
Notificationsdelivery success/failure per channel
Secretsresolution outcomes
AS2message send/receive counts, MDN outcomes
Audit sidecarindex build metrics

Xferity ships pre-built alert rules for:

  • Certificate expiry approaching or expired
  • Job queue building up (not draining)
  • Flow failure rate exceeding threshold
  • Worker health failures

EndpointAuth requiredPurpose
/health/workerNoneWorker readiness — safe for unauthenticated Kubernetes/load-balancer probes
/healthAuthenticatedGeneral service health
/health/secretsAuthenticatedSecret provider reachability
/health/certificatesAuthenticatedCertificate expiry status

Health payloads check:

  • state-store writability
  • audit path writability
  • worker claim latency and readiness
  • secret provider availability
  • certificate expiry conditions

Every log line emits structured JSON with fields including:

  • level — debug, info, warn, error
  • flow — flow name when applicable
  • run_id — unique run identifier
  • correlation_id — request or job correlation identifier
  • partner — partner name when applicable
  • msg — human-readable message

Log access:

  • CLI: xferity logs <flow> with level filter and tail mode
  • Log files on disk
  • Compatible with log aggregation tools (Loki, Fluentd, Splunk, etc.)

When OpenPGP operations run (via gopenpgp or GnuPG), Xferity emits dedicated structured fields:

FieldWhat it tells you
providerWhich provider handled the operation
modegopenpgp, gnupg, or auto
fallback_usedWhether fallback to GnuPG occurred
fallback_reasonWhy fallback was triggered
fallback_subreasonDetailed reason for fallback
cleanup_statusWhether GnuPG temp home cleanup succeeded

Typical interpretation:

  • provider=gopenpgp + fallback_used=false → native path handled it cleanly
  • fallback_used=true → native provider failed with compat_enterprise_key_structure; GnuPG succeeded
  • cleanup_status=failed → operation may have succeeded but temp workspace cleanup failed — requires investigation

Audit records are distinct from logs. They are structured lifecycle records:

  • One JSON event per line per file operation
  • Fields: timestamp, flow, run_id, correlation_id, event_type, file_name, outcome, error_code, and more
  • Append-only file model
  • SHA-256 hash chain across events (optional but recommended)
  • CLI: xferity trace <filename> — all audit events for a given file
  • HTTP API: GET /api/audit?file=<basename> — file lifecycle lookup
  • Sidecar index for fast file lookup without scanning full log
  • Compatible with jq, awk, or any JSONL tooling

When tamper evidence is enabled:

  • chain_seq — event sequence number
  • prev_hash — hash of preceding event
  • event_hash — SHA-256 hash of this event

Any modification, insertion, or deletion breaks the chain and is detectable with standard tooling.


Pre-execution diagnostics:

  • xferity diag [flow] — checks endpoint reachability, key availability, certificate validity, filesystem access
  • xferity validate — strict YAML config validation with field-level error reporting

Post-execution investigation:

  • xferity trace <filename> — file lifecycle across all runs
  • xferity flow history <flow> — per-run outcomes and retry counts
  • xferity logs <flow> — tailed runtime logs with level filter

Xferity observability capabilities include:

  • Prometheus-format metrics at /metrics (authenticated)
  • Coverage: flow runs, job queue depth, transfer bytes, retries, cert expiry, auth failures, notifications, secrets, AS2
  • Pre-built Prometheus alert rules (cert expiry, queue depth, failure rate)
  • /health/worker — unauthenticated readiness probe
  • /health, /health/secrets, /health/certificates — authenticated health checks
  • Structured JSON logs with flow, run_id, correlation_id on every line
  • CLI log tailing with level filter (xferity logs <flow>)
  • Crypto observability fields: provider, fallback_used, fallback_reason, cleanup_status
  • JSONL audit records — one event per file operation
  • SHA-256 hash-chain tamper evidence on audit events
  • Sidecar index for fast file lifecycle lookup
  • xferity trace <filename> — full file lifecycle across runs
  • xferity diag — endpoint and config pre-flight checks