Skip to content

Monitoring Transfers

Monitoring transfers means detecting problems before partners do, not after a missed file delivery triggers an escalation.

This page describes what to monitor, what tools are available, and how to integrate Xferity monitoring into your operational stack.

  • is the Xferity process running
  • is the state backend reachable (Postgres in Postgres-backed mode)
  • is the worker queue healthy and draining
  • did scheduled flows run on time
  • are failure rates increasing for any flow or partner
  • are retries accumulating beyond the normal baseline
  • are dead-letter paths accumulating files
  • are certificate expiry windows approaching
  • are there new Active Posture Findings
  • are secret resolution failures appearing in logs
  • is audit log retention working correctly
  • is the audit sidecar index being maintained
  • are logs being shipped if required
EndpointAuthPurpose
/health/workernoneWorker readiness check. Use for load balancer or Kubernetes probes.
/healthrequiredGeneral service health including state store writability and audit path.
/health/secretsrequiredSecret provider health check.
/health/certificatesrequiredCertificate expiry status check.

Use /health/worker for unauthenticated liveness and readiness probes. The other endpoints require authentication and are for administrative monitoring.

Xferity exposes Prometheus-format metrics at /metrics behind authenticated admin access.

Key metric areas:

AreaWhat to alert on
Flow runs and resultsrising failure rate per flow
Transfer bytes and filessudden drop in expected volume
Job queue depthqueue growing instead of draining
Retries and dead-letter eventsaccumulation over time
Certificate expirydays remaining below threshold
Auth failuresunusual rate of login failures
Rate-limit denialssustained request limiting on API

A Prometheus scrape target for Xferity requires a valid authentication token. Configure the scrape job with the appropriate bearer token or Basic Auth header depending on your setup.

Structured logs are the first place to look for operational issues.

Useful log patterns to alert on:

  • startup config validation errors
  • SFTP or FTPS connection failures
  • AS2 MDN failures or mismatches
  • secret resolution errors
  • worker claim or execution failures
  • notification delivery failures

Use -v flag or log_level: debug for verbose logs during troubleshooting. Do not leave debug logging enabled in production at high volume.

For quick operational checks:

Terminal window
# Current status across all flows
xferity flow status
# Run history with outcomes
xferity flow history payroll-upload
# Tail logs for a flow
xferity logs payroll-upload --follow

Flow status shows the last run result, time, and any active error state per flow. It is the fastest way to see whether something is wrong without opening the UI.

In Postgres-backed mode, the worker queue can accumulate if workers are not running, the database is unreachable, or job processing is consistently failing.

Signs of queue problems:

  • /health/worker returning unhealthy
  • queue depth metric growing monotonically
  • flow status showing jobs in pending state with no completion
  • logs showing worker poll errors or claim failures

When a worker stalls, check:

  1. is the worker process running
  2. is Postgres reachable from the worker host
  3. are jobs showing repeated failure counts in the database

Flows can be configured to send notifications on start, success, or failure. Treating these as operational signals (rather than just alerts) is useful for lower-volume critical flows where you want confirmation of each run.

For higher-volume flows, use metrics and alerting thresholds instead of per-run notifications to avoid alert fatigue.

For production deployments, a standard monitoring setup includes:

  1. /health/worker scrape or probe for process liveness
  2. Prometheus scrape of /metrics for trends and alerting
  3. log aggregation for log-based alerts (startup errors, connection failures)
  4. audit log export for file-level evidence
  5. posture API or UI review for security drift
  • monitoring does not replace backup and restore testing
  • monitoring does not replace audit evidence export for compliance paths
  • health checks do not verify end-to-end partner exchange — that requires real runs and audit trace verification