Observability in Xferity — Logs, Prometheus Metrics, Health Endpoints, and Audit Records
Observability
Section titled “Observability”Xferity exposes observability at three layers:
- logs for runtime behavior and diagnostics
- metrics for service and workload monitoring
- audit records for flow and file traceability
These layers serve different purposes and should not be treated as interchangeable.
Structured logs are used for runtime diagnostics, warnings, and error investigation.
They are useful for:
- startup failures
- configuration validation warnings
- protocol and connectivity failures
- worker behavior
- notification delivery errors
The CLI exposes log access through logs, including tailing and level filtering.
Metrics
Section titled “Metrics”Xferity exports Prometheus-format metrics through the Web runtime.
Coverage includes metrics for areas such as:
- flow runs and durations
- jobs enqueued, completed, and queue depth
- transfer bytes, files, and errors
- retries and dead-letter activity
- notification outcomes
- authentication failures and rate-limit denials
- certificate expiry state
- secret-resolution outcomes
- selected AS2 and audit-sidecar metrics
Metrics access boundary
Section titled “Metrics access boundary”/metrics is behind authenticated admin access. Do not document it as an anonymous scrape endpoint unless that changes.
Health checks
Section titled “Health checks”The runtime exposes several health-related endpoints.
The access model is:
/health/workerfor unauthenticated worker readiness/healthfor general service health behind authenticated access/health/secretsbehind authenticated access/health/certificatesbehind authenticated access
The health payload checks runtime conditions such as state-store writability, audit-path writability, and worker readiness.
Audit records
Section titled “Audit records”Audit records are distinct from logs. They are structured lifecycle records intended for file and flow traceability.
Use:
- logs to understand service behavior
- metrics to understand health and trends
- audit records to answer what happened to a specific file or run
See Audit Logging.
Recommended operating posture
Section titled “Recommended operating posture”For production use, teams usually combine:
- health checks
- metrics collection
- log shipping
- audit retention and external export
Because metrics and health endpoints are authenticated in the current implementation, deployment teams should plan their monitoring integration accordingly.
Crypto diagnostics and observability
Section titled “Crypto diagnostics and observability”When a flow uses pgp.provider=gnupg or pgp.provider=auto, diagnostics can show:
- provider mode
- resolved GnuPG binary path
- GnuPG version
- whether fallback capability is available on this host
This is useful before rollout, especially on Windows hosts or systems where gpg is not installed in a standard location.
Crypto log fields
Section titled “Crypto log fields”Crypto operations now emit structured fields that help operators understand whether fallback happened:
providermodefallback_usedfallback_reasonfallback_subreasoncleanup_status
Typical interpretation:
provider=gopenpgpandfallback_used=falsemeans the native path handled the operationfallback_used=truemeans the native path failed with a classified compatibility case and GnuPG was tried oncecleanup_status=failedmeans the crypto operation may have succeeded, but temporary crypto workspace cleanup did not complete cleanly and should be investigated
Secret safety in crypto logs
Section titled “Secret safety in crypto logs”Crypto observability is intentionally sanitized.
Xferity avoids logging:
- passphrases
- key material
- raw GnuPG stderr when it may contain sensitive data
Instead, logs use structured fields and redacted output summaries.
Boundaries and limits
Section titled “Boundaries and limits”To keep this page precise:
- metrics do not replace audit records for file evidence
- logs do not replace audit retention
- health endpoints do not replace external probes or end-to-end checks
- built-in telemetry does not replace SIEM or incident response processes