Investigating Failures

When a flow fails, the goal is to isolate what failed, why it failed, and whether it is safe to retry.

This page walks through the investigation workflow step by step.

Investigation order

Start with the broadest scope and narrow down. Do not assume the problem is a network issue or a partner issue until you have read the logs.

check xferity flow status for current state across all flows
check xferity flow history <flow> for the specific run outcome
read xferity logs <flow> for the failure detail
run xferity diag <flow> to check current endpoint and trust state
run xferity trace <filename> if you have a specific file to trace
check the posture page for any related security findings
escalate to partner if you have confirmed the problem is on their side

CLI tools for investigation

# Check current status across all flows
xferity flow status

# View run history for a flow
xferity flow history supplier-invoice-pickup

# Read logs (most recent)
xferity logs supplier-invoice-pickup

# Run diagnostics
xferity diag supplier-invoice-pickup

# Trace a specific file in audit records
xferity trace invoice-2026-03-15.xml

# Validate configuration
xferity validate

Failure scope

Identify the scope first — this determines whether the fix belongs to global config, partner config, flow config, or an external dependency.

Symptom	Probable scope
Multiple flows failing at startup	global config error
One partner failing across multiple flows	partner definition or endpoint issue
One flow failing, others fine	flow config, PGP material, or flow-specific endpoint
Intermittent failure matching partner schedule	endpoint intermittency or remote file availability
Failure after a config change	the changed config
Failure after a cert rotation	certificate re-binding or trust verification
Failure after a secret rotation	secret not resolving to new value

Config and validation failures

Symptoms: startup fails, flow does not load, YAML parse error in logs.

Check:

xferity validate output
YAML field names for typos (strict parser rejects unknown fields)
referenced paths and keys exist
hardened mode constraints are satisfied

Common config mistakes:

sftp.known_hosts path missing file: prefix
sftp.host_key_fingerprint not starting with SHA256:
schedule_cron using a five-field expression instead of six-field
partner id not matching filename

Secret resolution failures

Symptoms: flow fails before any network action, error mentions credential or secret.

Check:

env variable is set in the running process environment
file path is mounted and readable
vault or AWS Secrets Manager is reachable and the token/role has access
secret reference uses the correct prefix (env:, file:, vault:, etc.)
in hardened mode, plaintext values in sensitive fields are rejected

Use xferity diag <flow> — diagnostics include a credential resolution check.

SSH host verification failures (SFTP)

Symptoms: SFTP connection error mentioning host key.

Check:

known_hosts file exists and is readable
the host key in the file is current (partner may have rotated keys)
host_key_fingerprint matches the actual server fingerprint
if the partner changed SSH host keys, update the known_hosts entry

Do not set allow_insecure_host_key=true as a fix unless you also track it as an accepted finding.

FTPS TLS failures

Symptoms: TLS handshake failure, certificate validation error.

Check:

CA certificate chains are correct and complete
server certificate is not expired
tls.mode is explicit (implicit mode is not supported)
connection.passive=true is set
if using server_cert_fingerprint, it matches the current server cert

AS2 failures

Symptoms: AS2 message rejected, MDN error, signing or encryption failure.

Check:

partner AS2 ID in the config matches what the partner expects
certificate roles are correctly bound in the Certificate inventory
MDN signing is expected if expect_signed_mdn=true
the receiving endpoint URL is reachable from Xferity
HTTPS trust for the AS2 endpoint is configured

PGP decryption or encryption failures

Symptoms: crypto error in logs, compat_enterprise_key_structure mentioned.

Check:

the key material is present and readable
the passphrase resolves correctly
the key has not expired
if using provider=auto, check whether fallback to GnuPG was attempted
if fallback occurred: confirm GnuPG is installed and gnupg_binary path is correct

For compat_enterprise_key_structure:

this is a named compatibility case, not a bad key
the native provider could not handle the key layout
GnuPG fallback should handle it if configured correctly

Flow locking issues

Symptoms: flow fails with lock error, run does not start.

Check:

whether a previous run is still holding the lock
lock_stale_after_seconds — if the previous run died mid-execution, the lock may still exist
whether lock_wait=true is configured and whether max wait was exceeded

A stale lock means the previous run did not complete cleanly. Investigate why before clearing the lock.

Dead-letter artifacts

When a file ends up in the dead-letter path, the flow could not process it after exhausting retries.

Check:

filename and timestamp to correlate with log entries
log entries at the time of the failure
whether the underlying issue is transient (e.g., partner downtime) or permanent (e.g., corrupt file)

Do not delete dead-letter files without understanding the failure first.

Narrowing and escalation

After steps 1–6 above:

if the fault is in Xferity config: fix, validate, and rerun
if the fault is in the partner endpoint: confirm from logs, then contact the partner with evidence
if the fault is transient and the cause is resolved: check whether rerun is safe with idempotency in mind

Investigating Failures

Investigating Failures

Investigation order

CLI tools for investigation

Failure scope

Config and validation failures

Secret resolution failures

SSH host verification failures (SFTP)

FTPS TLS failures

AS2 failures

PGP decryption or encryption failures

Flow locking issues

Dead-letter artifacts

Narrowing and escalation

Related pages