Monitoring Transfers

Monitoring transfers means detecting problems before partners do, not after a missed file delivery triggers an escalation.

This page describes what to monitor, what tools are available, and how to integrate Xferity monitoring into your operational stack.

What to monitor

Service health

is the Xferity process running
is the state backend reachable (Postgres in Postgres-backed mode)
is the worker queue healthy and draining

Transfer health

did scheduled flows run on time
are failure rates increasing for any flow or partner
are retries accumulating beyond the normal baseline
are dead-letter paths accumulating files

Security and trust health

are certificate expiry windows approaching
are there new Active Posture Findings
are secret resolution failures appearing in logs

Evidence health

is audit log retention working correctly
is the audit sidecar index being maintained
are logs being shipped if required

Health check endpoints

Endpoint	Auth	Purpose
`/health/worker`	none	Worker readiness check. Use for load balancer or Kubernetes probes.
`/health`	required	General service health including state store writability and audit path.
`/health/secrets`	required	Secret provider health check.
`/health/certificates`	required	Certificate expiry status check.

Use /health/worker for unauthenticated liveness and readiness probes. The other endpoints require authentication and are for administrative monitoring.

Metrics

Xferity exposes Prometheus-format metrics at /metrics behind authenticated admin access.

Key metric areas:

Area	What to alert on
Flow runs and results	rising failure rate per flow
Transfer bytes and files	sudden drop in expected volume
Job queue depth	queue growing instead of draining
Retries and dead-letter events	accumulation over time
Certificate expiry	days remaining below threshold
Auth failures	unusual rate of login failures
Rate-limit denials	sustained request limiting on API

A Prometheus scrape target for Xferity requires a valid authentication token. Configure the scrape job with the appropriate bearer token or Basic Auth header depending on your setup.

Logs

Structured logs are the first place to look for operational issues.

Useful log patterns to alert on:

startup config validation errors
SFTP or FTPS connection failures
AS2 MDN failures or mismatches
secret resolution errors
worker claim or execution failures
notification delivery failures

Use -v flag or log_level: debug for verbose logs during troubleshooting. Do not leave debug logging enabled in production at high volume.

CLI flow status

For quick operational checks:

# Current status across all flows
xferity flow status

# Run history with outcomes
xferity flow history payroll-upload

# Tail logs for a flow
xferity logs payroll-upload --follow

Flow status shows the last run result, time, and any active error state per flow. It is the fastest way to see whether something is wrong without opening the UI.

Worker queue monitoring (Postgres mode)

In Postgres-backed mode, the worker queue can accumulate if workers are not running, the database is unreachable, or job processing is consistently failing.

Signs of queue problems:

/health/worker returning unhealthy
queue depth metric growing monotonically
flow status showing jobs in pending state with no completion
logs showing worker poll errors or claim failures

When a worker stalls, check:

is the worker process running
is Postgres reachable from the worker host
are jobs showing repeated failure counts in the database

Notification-based monitoring

Flows can be configured to send notifications on start, success, or failure. Treating these as operational signals (rather than just alerts) is useful for lower-volume critical flows where you want confirmation of each run.

For higher-volume flows, use metrics and alerting thresholds instead of per-run notifications to avoid alert fatigue.

Recommended monitoring stack

For production deployments, a standard monitoring setup includes:

/health/worker scrape or probe for process liveness
Prometheus scrape of /metrics for trends and alerting
log aggregation for log-based alerts (startup errors, connection failures)
audit log export for file-level evidence
posture API or UI review for security drift

What monitoring does not replace

monitoring does not replace backup and restore testing
monitoring does not replace audit evidence export for compliance paths
health checks do not verify end-to-end partner exchange — that requires real runs and audit trace verification

Monitoring Transfers

Monitoring Transfers

What to monitor

Service health

Transfer health

Security and trust health

Evidence health

Health check endpoints

Metrics

Logs

CLI flow status

Worker queue monitoring (Postgres mode)

Notification-based monitoring

Recommended monitoring stack

What monitoring does not replace

Related pages