Skip to content

Investigating Failures

When a flow fails, the goal is to isolate what failed, why it failed, and whether it is safe to retry.

This page walks through the investigation workflow step by step.

Start with the broadest scope and narrow down. Do not assume the problem is a network issue or a partner issue until you have read the logs.

  1. check xferity flow status for current state across all flows
  2. check xferity flow history <flow> for the specific run outcome
  3. read xferity logs <flow> for the failure detail
  4. run xferity diag <flow> to check current endpoint and trust state
  5. run xferity trace <filename> if you have a specific file to trace
  6. check the posture page for any related security findings
  7. escalate to partner if you have confirmed the problem is on their side
Terminal window
# Check current status across all flows
xferity flow status
# View run history for a flow
xferity flow history supplier-invoice-pickup
# Read logs (most recent)
xferity logs supplier-invoice-pickup
# Run diagnostics
xferity diag supplier-invoice-pickup
# Trace a specific file in audit records
xferity trace invoice-2026-03-15.xml
# Validate configuration
xferity validate

Identify the scope first — this determines whether the fix belongs to global config, partner config, flow config, or an external dependency.

SymptomProbable scope
Multiple flows failing at startupglobal config error
One partner failing across multiple flowspartner definition or endpoint issue
One flow failing, others fineflow config, PGP material, or flow-specific endpoint
Intermittent failure matching partner scheduleendpoint intermittency or remote file availability
Failure after a config changethe changed config
Failure after a cert rotationcertificate re-binding or trust verification
Failure after a secret rotationsecret not resolving to new value

Symptoms: startup fails, flow does not load, YAML parse error in logs.

Check:

  • xferity validate output
  • YAML field names for typos (strict parser rejects unknown fields)
  • referenced paths and keys exist
  • hardened mode constraints are satisfied

Common config mistakes:

  • sftp.known_hosts path missing file: prefix
  • sftp.host_key_fingerprint not starting with SHA256:
  • schedule_cron using a five-field expression instead of six-field
  • partner id not matching filename

Symptoms: flow fails before any network action, error mentions credential or secret.

Check:

  • env variable is set in the running process environment
  • file path is mounted and readable
  • vault or AWS Secrets Manager is reachable and the token/role has access
  • secret reference uses the correct prefix (env:, file:, vault:, etc.)
  • in hardened mode, plaintext values in sensitive fields are rejected

Use xferity diag <flow> — diagnostics include a credential resolution check.

Symptoms: SFTP connection error mentioning host key.

Check:

  • known_hosts file exists and is readable
  • the host key in the file is current (partner may have rotated keys)
  • host_key_fingerprint matches the actual server fingerprint
  • if the partner changed SSH host keys, update the known_hosts entry

Do not set allow_insecure_host_key=true as a fix unless you also track it as an accepted finding.

Symptoms: TLS handshake failure, certificate validation error.

Check:

  • CA certificate chains are correct and complete
  • server certificate is not expired
  • tls.mode is explicit (implicit mode is not supported)
  • connection.passive=true is set
  • if using server_cert_fingerprint, it matches the current server cert

Symptoms: AS2 message rejected, MDN error, signing or encryption failure.

Check:

  • partner AS2 ID in the config matches what the partner expects
  • certificate roles are correctly bound in the Certificate inventory
  • MDN signing is expected if expect_signed_mdn=true
  • the receiving endpoint URL is reachable from Xferity
  • HTTPS trust for the AS2 endpoint is configured

Symptoms: crypto error in logs, compat_enterprise_key_structure mentioned.

Check:

  • the key material is present and readable
  • the passphrase resolves correctly
  • the key has not expired
  • if using provider=auto, check whether fallback to GnuPG was attempted
  • if fallback occurred: confirm GnuPG is installed and gnupg_binary path is correct

For compat_enterprise_key_structure:

  • this is a named compatibility case, not a bad key
  • the native provider could not handle the key layout
  • GnuPG fallback should handle it if configured correctly

Symptoms: flow fails with lock error, run does not start.

Check:

  • whether a previous run is still holding the lock
  • lock_stale_after_seconds — if the previous run died mid-execution, the lock may still exist
  • whether lock_wait=true is configured and whether max wait was exceeded

A stale lock means the previous run did not complete cleanly. Investigate why before clearing the lock.

When a file ends up in the dead-letter path, the flow could not process it after exhausting retries.

Check:

  • filename and timestamp to correlate with log entries
  • log entries at the time of the failure
  • whether the underlying issue is transient (e.g., partner downtime) or permanent (e.g., corrupt file)

Do not delete dead-letter files without understanding the failure first.

After steps 1–6 above:

  • if the fault is in Xferity config: fix, validate, and rerun
  • if the fault is in the partner endpoint: confirm from logs, then contact the partner with evidence
  • if the fault is transient and the cause is resolved: check whether rerun is safe with idempotency in mind