Long Delays & Callbacks
Pause workflows for hours or days, or until an external system responds - without holding worker resources.
Some workflows need to wait. A drip campaign sends three follow-up emails over a week. An approval flow blocks until a human clicks Approve in Slack. A scheduled retry kicks back in four hours after a flaky external API failed.
FlowTrux has two primitives for these patterns, and both share the same suspend/resume infrastructure: the worker slot is freed during the wait, the run survives a server restart, and resumption is driven by Redis-backed jobs.
TL;DR
| Pattern | When to use | Action types |
|---|---|---|
| Long Delay | "Wait N seconds/minutes/hours/days, then continue" - clock-based | One Action node, type delay |
| Callback | "Wait until this URL is hit, then continue" - event-based | Two Action nodes: generate_callback → notify → wait_for_callback |
Both pause the run with status PAUSED. Both resume from exactly where they left off, with the full state context intact ({{steps.*}}, {{global.*}}, {{trigger.*}}, loop context).
Long Delay
A delay Action just waits. Configure duration (number) and unit (ms, s, m, h). Maximum 7 days.
There are two execution modes - picked automatically by the engine based on the requested duration:
| Duration | Mode | Effect on the worker |
|---|---|---|
| ≤ 10 seconds | In-process setTimeout | Worker stays active during the pause. Used for very short waits where suspend/resume overhead would dominate. |
| > 10 seconds, up to 7 days | Suspend → BullMQ delayed job → resume | Worker slot is freed for the entire pause. Run state lives in Postgres; the timer lives in Redis. |
You don't pick the mode - the engine just picks the cheaper one. Behavior is identical from the outside: the next node runs after the requested time, regardless of mode.
Output
Both modes produce a delay output you can reference downstream:
{
"delayed": true,
"durationMs": 3600000,
"requested": { "duration": 1, "unit": "h" }
}
After a long delay (suspend mode), the output also includes:
{
"delayed": true,
"longDelay": true,
"durationMs": 3600000,
"resumedAt": "2026-04-30T15:00:00.000Z"
}
Resource impact
Long delays are very cheap. As a rough order of magnitude:
- 1,000 concurrent long-delayed runs - about 200 KB in Redis and 5 MB in Postgres, zero worker slots held.
- A run paused for 7 days costs the same as a run paused for 1 hour.
- Server restarts are safe: state is in Postgres and the timer is in Redis. After a restart, BullMQ rehydrates the delayed job and the worker resumes the run when the timer fires.
This is the difference between "throttle our workflow concurrency" and "schedule a million follow-up emails over the next week" - the latter is the correct shape for Long Delay.
When to use it
- Drip campaigns - send → wait 24h → send → wait 3d → send.
- Cool-downs after a failure - try → on error, wait 30 minutes → retry once.
- Scheduled follow-ups - created a ticket → wait 7 days → check whether it's still open and ping the owner.
When NOT to use it
- You're waiting for an external event (a user click, an inbound webhook, a 3rd-party callback). Use the Callback pattern below instead - Long Delay can't be triggered early.
- You want a recurring schedule (every Monday at 9am). Use a
cronTrigger on a separate workflow.
Callbacks (Wait for external input)
The Callback pattern pauses a workflow until an external system POSTs to a URL the workflow itself generated. Two cooperating Action types:
generate_callback
Generates a unique callback URL and a 192-bit random token, and stores the token on the Execution record. Place this before the notification that sends the URL out.
Config:
| Field | Default | Notes |
|---|---|---|
timeout | 3600 (1 hour) | Seconds the workflow will wait. Maximum 86,400 (24h). |
Output:
{
"callbackUrl": "https://your-domain.com/api/webhooks/callback/<token>",
"token": "<192-bit hex>",
"expiresAt": "2026-04-30T16:00:00.000Z"
}
wait_for_callback
Pauses the workflow until the URL is hit. Status flips to PAUSED, the worker slot is freed, and BullMQ schedules the timeout job using expiresAt from the matching generate_callback.
Config:
| Field | Notes |
|---|---|
token | Reference the upstream node's token: {{steps.<generate-callback-id>.output.token}} |
Output (after resume):
{
"resumed": true,
"callbackData": { "...": "the JSON or form body the caller POSTed" }
}
The flow
Trigger
└─> generate_callback ← creates URL + token
└─> Slack/Gmail ← sends URL to approver, embedded in a button or link
└─> wait_for_callback ← workflow PAUSES here
└─> branch on callbackData.action == "approve" / "reject"
└─> notify result, finish
generate_callbackproduces the URL + token, attached to the current execution.- The notification node embeds
callbackUrlinto a Slack message, an email, an SMS, or anywhere else the approver will see it. wait_for_callbackputs the run into PAUSED. The worker moves on to other jobs.- When the approver clicks the link (or the system POSTs to it programmatically), the callback endpoint receives the request, queues a resume job, and FlowRunner picks the run back up from its saved state.
- If the timeout fires before the URL is hit, the run fails with a timeout error.
The endpoint
POST /api/webhooks/callback/<token>
- Accepts JSON or form-encoded body.
- The body becomes
callbackDatain the wait node's output. - No HMAC required. The token itself is the auth - it's 192 bits of random and only valid for one specific paused execution.
- The endpoint dedupes: a second POST with the same token after the first one resumes is a no-op.
Slack-approval example
Trigger (manual)
→ generate_callback (timeout: 86400)
→ slack.send_blocks (channel: ops, blocks: [
{type: "section", text: "Deploy v1.42 to production?"},
{type: "actions", elements: [
{type: "button", text: "Approve",
url: "{{steps.cb.output.callbackUrl}}?action=approve"},
{type: "button", text: "Reject",
url: "{{steps.cb.output.callbackUrl}}?action=reject"}
]}
])
→ wait_for_callback (token: {{steps.cb.output.token}})
→ if-else (condition: {{steps.wait.output.callbackData._query.action}} === "approve")
true → http.post (https://ci/deploy)
false → slack.send_message (channel: ops, text: "Deploy cancelled by {{...}}")
The query-string trick (?action=approve) lets a single callback URL carry the user's choice - no separate URLs per button. The dispatcher reads callbackData._query.action after resume.
Behavior under server restarts
- PAUSED runs survive restarts. State is persisted to Postgres at the moment of suspension; the BullMQ timeout job is in Redis.
- After a restart, a callback POST works even if the worker that originally suspended the run is gone - the resume job is queued, and any worker can pick it up.
- The same is true for the timeout: BullMQ fires the delayed job; the worker that picks it up looks up the still-PAUSED execution and fails it.
When to use it
- Approval flows - Slack/email/Teams "Approve / Reject" buttons that drive a workflow.
- Manual review checkpoints - pause a pipeline until QA hits a "ship it" link.
- Two-system handoffs - your workflow kicks off work in an external system and wants to wait for that system's "done" callback before continuing.
- OTP / 2FA-style waits - generate a magic link, send it, wait for the click.
When NOT to use it
- You want to wait exactly N hours, not "until something happens". Use Long Delay.
- The external system polls for status rather than pushing. Loop with a Long Delay between probes; when the status flips, exit the loop.
- The wait is longer than 24 hours. The
wait_for_callbacktimeout maxes at 86,400 seconds. For multi-day approval windows, generate a fresh callback inside a Long Delay loop, or split the workflow into two runs joined by a global variable / state record.
Comparing the two
| Long Delay | Callback | |
|---|---|---|
| Trigger to resume | Time elapsed | External POST |
| Can resume early? | No | Yes (any time before timeout) |
| Max wait | 7 days | 24 hours |
| Carries data on resume? | No | Yes - callbackData |
| Number of nodes | 1 (delay) | 2 (generate_callback + wait_for_callback) |
Both share: PAUSED status, worker slot freed during wait, restart-safe, full state context preserved across resume, fully integrated with execution history.
Related
- Node Types -
delay,generate_callback,wait_for_callbackAction types - Webhooks - the auth model used for triggering workflows from outside (callbacks use a token-only variant)
- Workflow Templates - several built-in templates use the callback pattern