Monitoring Transfers
Monitoring Transfers
Section titled “Monitoring Transfers”Monitoring transfers means detecting problems before partners do, not after a missed file delivery triggers an escalation.
This page describes what to monitor, what tools are available, and how to integrate Xferity monitoring into your operational stack.
What to monitor
Section titled “What to monitor”Service health
Section titled “Service health”- is the Xferity process running
- is the state backend reachable (Postgres in Postgres-backed mode)
- is the worker queue healthy and draining
Transfer health
Section titled “Transfer health”- did scheduled flows run on time
- are failure rates increasing for any flow or partner
- are retries accumulating beyond the normal baseline
- are dead-letter paths accumulating files
Security and trust health
Section titled “Security and trust health”- are certificate expiry windows approaching
- are there new Active Posture Findings
- are secret resolution failures appearing in logs
Evidence health
Section titled “Evidence health”- is audit log retention working correctly
- is the audit sidecar index being maintained
- are logs being shipped if required
Health check endpoints
Section titled “Health check endpoints”| Endpoint | Auth | Purpose |
|---|---|---|
/health/worker | none | Worker readiness check. Use for load balancer or Kubernetes probes. |
/health | required | General service health including state store writability and audit path. |
/health/secrets | required | Secret provider health check. |
/health/certificates | required | Certificate expiry status check. |
Use /health/worker for unauthenticated liveness and readiness probes. The other endpoints require authentication and are for administrative monitoring.
Metrics
Section titled “Metrics”Xferity exposes Prometheus-format metrics at /metrics behind authenticated admin access.
Key metric areas:
| Area | What to alert on |
|---|---|
| Flow runs and results | rising failure rate per flow |
| Transfer bytes and files | sudden drop in expected volume |
| Job queue depth | queue growing instead of draining |
| Retries and dead-letter events | accumulation over time |
| Certificate expiry | days remaining below threshold |
| Auth failures | unusual rate of login failures |
| Rate-limit denials | sustained request limiting on API |
A Prometheus scrape target for Xferity requires a valid authentication token. Configure the scrape job with the appropriate bearer token or Basic Auth header depending on your setup.
Structured logs are the first place to look for operational issues.
Useful log patterns to alert on:
- startup config validation errors
- SFTP or FTPS connection failures
- AS2 MDN failures or mismatches
- secret resolution errors
- worker claim or execution failures
- notification delivery failures
Use -v flag or log_level: debug for verbose logs during troubleshooting. Do not leave debug logging enabled in production at high volume.
CLI flow status
Section titled “CLI flow status”For quick operational checks:
# Current status across all flowsxferity flow status
# Run history with outcomesxferity flow history payroll-upload
# Tail logs for a flowxferity logs payroll-upload --followFlow status shows the last run result, time, and any active error state per flow. It is the fastest way to see whether something is wrong without opening the UI.
Worker queue monitoring (Postgres mode)
Section titled “Worker queue monitoring (Postgres mode)”In Postgres-backed mode, the worker queue can accumulate if workers are not running, the database is unreachable, or job processing is consistently failing.
Signs of queue problems:
/health/workerreturning unhealthy- queue depth metric growing monotonically
- flow status showing jobs in pending state with no completion
- logs showing worker poll errors or claim failures
When a worker stalls, check:
- is the worker process running
- is Postgres reachable from the worker host
- are jobs showing repeated failure counts in the database
Notification-based monitoring
Section titled “Notification-based monitoring”Flows can be configured to send notifications on start, success, or failure. Treating these as operational signals (rather than just alerts) is useful for lower-volume critical flows where you want confirmation of each run.
For higher-volume flows, use metrics and alerting thresholds instead of per-run notifications to avoid alert fatigue.
Recommended monitoring stack
Section titled “Recommended monitoring stack”For production deployments, a standard monitoring setup includes:
/health/workerscrape or probe for process liveness- Prometheus scrape of
/metricsfor trends and alerting - log aggregation for log-based alerts (startup errors, connection failures)
- audit log export for file-level evidence
- posture API or UI review for security drift
What monitoring does not replace
Section titled “What monitoring does not replace”- monitoring does not replace backup and restore testing
- monitoring does not replace audit evidence export for compliance paths
- health checks do not verify end-to-end partner exchange — that requires real runs and audit trace verification