Trading Outage Runbooks: Handling Platform and Broker Failures

Why every trader needs an outage runbook

Unexpected failures are part of market participation. Data feeds stall, order entry locks up, mobile apps will not authenticate, or a broker’s routing layer slows under volume. In these moments, attention narrows and stress rises, which increases the risk of impulsive or frozen responses. Research across high-reliability fields shows that predefined checklists and runbooks reduce cognitive load and error rates by converting ambiguity into a small sequence of verified actions. Trading is no different. A concise outage runbook protects capital, preserves decision quality, and gives a repeatable path to debrief and improve.

Define the failure before reacting

Not all failures are the same. A platform outage is different from a broker-side order routing delay, a data feed desync, or an exchange-level halt. The first decision is to classify what is happening and invoke the correct section of the runbook. A quick diagnostic triage can prevent compounding mistakes. If the chart is still moving on an independent data source but the order ticket is unresponsive, the problem is probably broker-side. If several independent feeds freeze at once, the issue might be exchange-level or internet connectivity. This initial classification frames what actions are feasible in the next minute.

Building the outage runbook

A strong runbook is short, concrete, and organized by triggers and actions. It starts with detectable signals that require escalation. Examples include order rejects on multiple symbols, a login loop, or a bid-ask freeze that persists beyond a set threshold such as 15 seconds in liquid hours. The goal is to leave no room for debate under stress: if the trigger occurs, escalate.

Stabilize the environment first. Confirm local connectivity by checking a non-financial site on a secondary device and, if possible, switching to a backup internet connection. Toggle between wired and wireless if both exist. If the local link is good but the platform remains degraded, move to broker contact and alternative execution steps.

Assess exposure in plain numbers. Write down current open positions, notional exposure, average price, and resting protection orders. Note which stops are exchange-native and which are held on the broker’s servers. This difference matters because exchange-native orders may still rest even if the platform is unavailable, while broker-held triggers will not fire if the broker cannot process them.

Activate the communication tree. Keep the broker’s trading desk number, account identifiers, and security phrases accessible offline. If the app is locked, phone execution may still be available after identity verification. If a secondary broker is funded, transition to it only for risk-reducing actions that do not over-hedge or double exposure. Keep the steps narrow: close or hedge, do not open new discretionary risk while blind.

Define strict rules about what is allowed during an outage. Many losses grow because traders try to “make the platform work” by clicking repeatedly or entering duplicate orders. The runbook should clarify a limited sequence: attempt order once, if no confirmation within a set time window, stop trying and escalate to phone execution or alternate venue. Close the feedback loop by documenting order IDs and time stamps.

Keep it crisp and printable. The runbook should be one page, readable at a glance, with a second page for contacts and account notes. Store a copy on paper and on a device that does not require the primary platform to open.

The minimal runbook kit

Triggers and the first three actions for each failure type
A current contact sheet: broker trading desk, account details needed for phone execution, secondary broker access
A fallback map: secondary internet, independent data source, allowed hedges or close-only orders

Practical techniques that hold under stress

Physiological arousal during outages can impair judgment. A short bout of paced breathing, such as inhaling for four seconds and exhaling for six seconds for one to two minutes, reliably lowers heart rate and supports executive control. The aim is not relaxation for its own sake but the restoration of cognitive bandwidth for a high-quality decision.

Implementation intentions help convert goals into actions. An effective if-then line reads: “If the platform freezes for more than 15 seconds during active risk, then switch to independent data, stop clicking new orders, and call the trading desk.” When rehearsed, this line becomes a reflex that overrides unproductive impulses.

Example scenarios

An intraday trader sees spreads widen and the order ticket lag during a macro release. The chart updates on an independent feed, but the broker UI halts. The runbook triggers after 15 seconds. The trader stops entering new orders, calls the trading desk with account verification ready, and requests a close at market for the open position. A brief journal entry captures time, steps, and observed slippage for future calibration.

A swing trader loses home internet. The platform disconnects, but exchange-native stop orders are resting. The runbook directs an immediate switch to a mobile hotspot and a quick status check using an independent data source. The trader confirms that stops remain in the book, refrains from new entries, and documents the interruption with exact timestamps. After the session, the trader updates the contingency plan to include a second hotspot device to reduce single-point failure.

Journaling and post-incident review

Outage episodes are valuable data. The post-incident journal should read like a concise timeline. Record the first symptom, trigger time, actions taken, confirmation or rejection messages, broker interactions, and final outcomes including slippage and variance between expected and executed prices. Note whether stops were exchange-native or broker-held and how that affected protection.

Classify root causes into categories such as local connectivity, broker capacity, or upstream venue issues. Evaluate decision quality, not just PnL. A good outcome with poor process is a near miss and still requires action. Translate findings into one to three concrete changes, for example, raising resting stop usage on volatile days, funding a small secondary account for close-only emergencies, or shortening the trigger threshold during peak periods.

A scorecard supports improvement across incidents. Track detection time, time to first effective action, percent of positions with exchange-native protection, adherence to the no-new-risk rule, and completeness of the journal. Trends over several months reveal whether readiness is actually improving.

Risk containment before outages happen

Resilience is built before the next failure. Position sizing should account for the possibility that protection may not trigger exactly as modeled. Many brokers allow stop or limit orders to rest at the exchange on certain instruments; using those orders reduces reliance on broker-side triggers. On assets where this is not possible, the runbook should assume higher gap risk and smaller leverage.

Schedule awareness helps. Brokers publish maintenance windows and known risk periods such as major releases that tend to stress systems. Reducing size or avoiding new entries shortly before those windows is rational. Time-of-day effects also matter; peak flow around market opens can increase latency. Setting a small buffer period after the open before adding risk gives the platform time to stabilize.

Independent data sources protect situational awareness. A separate charting feed or a public venue status page can anchor reality when the primary platform goes dark. Keeping a basic, non-branded web bookmark list on a secondary device avoids fumbling during stress.

Weekly rhythm tip for Sunday

Sunday is a low-pressure window to rehearse. Run a five-minute outage drill: open the printed runbook, speak the trigger lines out loud, locate the broker’s trading desk number, verify the secondary internet pathway, and check that a small balance remains in a backup account for close-only actions. End with a one-paragraph journal note and a dated photo of the contact sheet so the most recent version is always at hand.

A simple runbook template

Trigger: “Platform unresponsive for 15 seconds while in a position.” Action: confirm internet on a secondary device, load an independent quote to classify the failure, and stop submitting new orders. If still unresponsive, call the broker’s trading desk with account verification ready and request a close at market, documenting time and order details. If the desk is unavailable, switch to the secondary broker and execute a close-only hedge sized to the net exposure. Avoid oversizing or doubling risk.

Trigger: “Data feed desync but ticket responsive.” Action: cross-check price on the independent feed, place protective orders using limit logic if spreads are unstable, and tighten risk until data normalizes. If quotes deviate beyond a predefined tolerance, stand down from new entries and move to close-only until alignment returns.

Trigger: “Local connectivity loss.” Action: switch to backup internet or mobile hotspot, confirm platform login, verify that exchange-native orders are present, and avoid new discretionary entries until stability is confirmed for several minutes.

Close each incident by writing a short debrief: what happened, what was done, what worked, what failed, and what will change. Update the runbook the same day while memory is fresh.

Closing

Outage runbooks turn rare but consequential moments into manageable routines. Clear triggers, tight actions, and disciplined journaling convert panic into process. Over time, the scorecard documents faster detection, cleaner execution, and fewer unforced errors. When failure arrives, preparation is the edge.