FlowTruxFlowTrux/Docs
Docsworkflows

Long Delays & Callbacks

Pause workflows for hours or days, or until an external system responds - without holding worker resources.

Some workflows need to wait. A drip campaign sends three follow-up emails over a week. An approval flow blocks until a human clicks Approve in Slack. A scheduled retry kicks back in four hours after a flaky external API failed.

FlowTrux has two primitives for these patterns, and both share the same suspend/resume infrastructure: the worker slot is freed during the wait, the run survives a server restart, and resumption is driven by Redis-backed jobs.

TL;DR

PatternWhen to useAction types
Long Delay"Wait N seconds/minutes/hours/days, then continue" - clock-basedOne Action node, type delay
Callback"Wait until this URL is hit, then continue" - event-basedTwo Action nodes: generate_callback → notify → wait_for_callback

Both pause the run with status PAUSED. Both resume from exactly where they left off, with the full state context intact ({{steps.*}}, {{global.*}}, {{trigger.*}}, loop context).


Long Delay

A delay Action just waits. Configure duration (number) and unit (ms, s, m, h). Maximum 7 days.

There are two execution modes - picked automatically by the engine based on the requested duration:

DurationModeEffect on the worker
≤ 10 secondsIn-process setTimeoutWorker stays active during the pause. Used for very short waits where suspend/resume overhead would dominate.
> 10 seconds, up to 7 daysSuspend → BullMQ delayed job → resumeWorker slot is freed for the entire pause. Run state lives in Postgres; the timer lives in Redis.

You don't pick the mode - the engine just picks the cheaper one. Behavior is identical from the outside: the next node runs after the requested time, regardless of mode.

Output

Both modes produce a delay output you can reference downstream:

{
  "delayed": true,
  "durationMs": 3600000,
  "requested": { "duration": 1, "unit": "h" }
}

After a long delay (suspend mode), the output also includes:

{
  "delayed": true,
  "longDelay": true,
  "durationMs": 3600000,
  "resumedAt": "2026-04-30T15:00:00.000Z"
}

Resource impact

Long delays are very cheap. As a rough order of magnitude:

  • 1,000 concurrent long-delayed runs - about 200 KB in Redis and 5 MB in Postgres, zero worker slots held.
  • A run paused for 7 days costs the same as a run paused for 1 hour.
  • Server restarts are safe: state is in Postgres and the timer is in Redis. After a restart, BullMQ rehydrates the delayed job and the worker resumes the run when the timer fires.

This is the difference between "throttle our workflow concurrency" and "schedule a million follow-up emails over the next week" - the latter is the correct shape for Long Delay.

When to use it

  • Drip campaigns - send → wait 24h → send → wait 3d → send.
  • Cool-downs after a failure - try → on error, wait 30 minutes → retry once.
  • Scheduled follow-ups - created a ticket → wait 7 days → check whether it's still open and ping the owner.

When NOT to use it

  • You're waiting for an external event (a user click, an inbound webhook, a 3rd-party callback). Use the Callback pattern below instead - Long Delay can't be triggered early.
  • You want a recurring schedule (every Monday at 9am). Use a cron Trigger on a separate workflow.

Callbacks (Wait for external input)

The Callback pattern pauses a workflow until an external system POSTs to a URL the workflow itself generated. Two cooperating Action types:

generate_callback

Generates a unique callback URL and a 192-bit random token, and stores the token on the Execution record. Place this before the notification that sends the URL out.

Config:

FieldDefaultNotes
timeout3600 (1 hour)Seconds the workflow will wait. Maximum 86,400 (24h).

Output:

{
  "callbackUrl": "https://your-domain.com/api/webhooks/callback/<token>",
  "token": "<192-bit hex>",
  "expiresAt": "2026-04-30T16:00:00.000Z"
}

wait_for_callback

Pauses the workflow until the URL is hit. Status flips to PAUSED, the worker slot is freed, and BullMQ schedules the timeout job using expiresAt from the matching generate_callback.

Config:

FieldNotes
tokenReference the upstream node's token: {{steps.<generate-callback-id>.output.token}}

Output (after resume):

{
  "resumed": true,
  "callbackData": { "...": "the JSON or form body the caller POSTed" }
}

The flow

Trigger
  └─> generate_callback   ← creates URL + token
       └─> Slack/Gmail    ← sends URL to approver, embedded in a button or link
            └─> wait_for_callback   ← workflow PAUSES here
                 └─> branch on callbackData.action == "approve" / "reject"
                      └─> notify result, finish
  1. generate_callback produces the URL + token, attached to the current execution.
  2. The notification node embeds callbackUrl into a Slack message, an email, an SMS, or anywhere else the approver will see it.
  3. wait_for_callback puts the run into PAUSED. The worker moves on to other jobs.
  4. When the approver clicks the link (or the system POSTs to it programmatically), the callback endpoint receives the request, queues a resume job, and FlowRunner picks the run back up from its saved state.
  5. If the timeout fires before the URL is hit, the run fails with a timeout error.

The endpoint

POST /api/webhooks/callback/<token>
  • Accepts JSON or form-encoded body.
  • The body becomes callbackData in the wait node's output.
  • No HMAC required. The token itself is the auth - it's 192 bits of random and only valid for one specific paused execution.
  • The endpoint dedupes: a second POST with the same token after the first one resumes is a no-op.

Slack-approval example

Trigger (manual)
  → generate_callback (timeout: 86400)
  → slack.send_blocks (channel: ops, blocks: [
        {type: "section", text: "Deploy v1.42 to production?"},
        {type: "actions", elements: [
          {type: "button", text: "Approve",
           url: "{{steps.cb.output.callbackUrl}}?action=approve"},
          {type: "button", text: "Reject",
           url: "{{steps.cb.output.callbackUrl}}?action=reject"}
        ]}
     ])
  → wait_for_callback (token: {{steps.cb.output.token}})
  → if-else (condition: {{steps.wait.output.callbackData._query.action}} === "approve")
        true  → http.post (https://ci/deploy)
        false → slack.send_message (channel: ops, text: "Deploy cancelled by {{...}}")

The query-string trick (?action=approve) lets a single callback URL carry the user's choice - no separate URLs per button. The dispatcher reads callbackData._query.action after resume.

Behavior under server restarts

  • PAUSED runs survive restarts. State is persisted to Postgres at the moment of suspension; the BullMQ timeout job is in Redis.
  • After a restart, a callback POST works even if the worker that originally suspended the run is gone - the resume job is queued, and any worker can pick it up.
  • The same is true for the timeout: BullMQ fires the delayed job; the worker that picks it up looks up the still-PAUSED execution and fails it.

When to use it

  • Approval flows - Slack/email/Teams "Approve / Reject" buttons that drive a workflow.
  • Manual review checkpoints - pause a pipeline until QA hits a "ship it" link.
  • Two-system handoffs - your workflow kicks off work in an external system and wants to wait for that system's "done" callback before continuing.
  • OTP / 2FA-style waits - generate a magic link, send it, wait for the click.

When NOT to use it

  • You want to wait exactly N hours, not "until something happens". Use Long Delay.
  • The external system polls for status rather than pushing. Loop with a Long Delay between probes; when the status flips, exit the loop.
  • The wait is longer than 24 hours. The wait_for_callback timeout maxes at 86,400 seconds. For multi-day approval windows, generate a fresh callback inside a Long Delay loop, or split the workflow into two runs joined by a global variable / state record.

Comparing the two

Long DelayCallback
Trigger to resumeTime elapsedExternal POST
Can resume early?NoYes (any time before timeout)
Max wait7 days24 hours
Carries data on resume?NoYes - callbackData
Number of nodes1 (delay)2 (generate_callback + wait_for_callback)

Both share: PAUSED status, worker slot freed during wait, restart-safe, full state context preserved across resume, fully integrated with execution history.

  • Node Types - delay, generate_callback, wait_for_callback Action types
  • Webhooks - the auth model used for triggering workflows from outside (callbacks use a token-only variant)
  • Workflow Templates - several built-in templates use the callback pattern