Trading Runbooks for Platform Outages and Broker Failures

Why an outage runbook belongs in every trading plan

Platform outages and broker disruptions are not rare events. They cluster around high-volume moments, software updates, and network congestion. The psychological cost is immediate: uncertainty spikes arousal, narrows attention, and pushes decisions toward habitual shortcuts. Research on acute stress and decision making consistently shows faster but less accurate choices when information flow degrades. A written runbook offsets this by externalizing decisions ahead of time, reducing cognitive load and curbing panic-driven actions.

A trading runbook is a simple, unambiguous set of if-then actions for specific failure modes. It turns a vague intention like “stay calm and close risk” into steps with time thresholds, backup channels, and documentation prompts. The point is not to eliminate risk. It is to shorten detection time, limit error-prone improvisation, and preserve capital while execution pathways are degraded.

Define the scope: what can fail and what matters

Start by mapping dependencies across five layers: internet connectivity, power, market data, order execution, and authentication. Failures rarely arrive cleanly. Data may be stale while orders still route. A mobile app may work while the desktop platform hangs. Knowing which layer is broken determines whether the safest action is to hedge, flatten, or do nothing.

Create a short taxonomy that fits your setup: data-only outage, execution-only outage, account lockout or 2FA failure, internet loss, power loss, platform crash, and broker-wide incident. For each, identify the positions most exposed. For example, options spreads with latent assignment risk demand different contingencies than intraday futures scalps. Also verify where your protective orders live. Some stops and OCOs are held server-side by the broker, others only on the local platform. The runbook should state this explicitly because it controls whether you can rely on resting protection when your platform is down.

Detection and triage without rumor or guesswork

Outages invite speculation. Replace guesswork with objective signals. Cross-check quotes on a secondary data app. Confirm broker and exchange status pages. Time-box uncertainty with thresholds: for example, treat any unresponsive order ticket for 15 seconds as a triage event during high volatility. Simple network checks and a secondary network path help distinguish local issues from broker incidents. The goal is to label the failure mode quickly enough to choose a prewritten path instead of troubleshooting ad hoc while exposed.

A concise triage set can live near your screen:

Is this data, orders, or both?
Is the failure local (internet, power, device) or broker-side?
What is the immediate risk posture for current positions?

The default risk stance under uncertainty

When information quality is in doubt, the safest default is to avoid adding exposure and to preserve protection that already exists. The runbook should state a default posture for different scenarios. If data is stale but server-side stops are confirmed live, maintaining positions may be safer than firing blind market orders. If order status is unknown and protection is not confirmed, the priority becomes flattening or hedging through a verified channel.

Set clear thresholds for action. For example, if a stop cannot be verified within one minute during a fast market, shift to a flatten-or-hedge protocol. This prevents passive hoping while slippage risk grows. Thresholds are a form of precommitment, and precommitment has strong support in behavioral research for reducing impulsive deviation under stress.

Alternate execution paths and hedges

Redundancy should be practical, not theoretical. A small, funded backup account at a different broker provides real optionality. Mobile apps often connect when desktop platforms stall, and many brokers maintain a phone dealing desk for emergency orders. Keep account numbers, passphrases, and instructions printed and sealed. Rehearse placing a simple market order on the phone app so the interface is familiar.

For positions that cannot be directly closed because the primary broker is down, define a temporary hedge. Correlated instruments can neutralize most of the directional risk until access returns. For example, if holding an index future and execution is unavailable, shorting the corresponding ETF or micro future at the backup venue can materially reduce exposure. The runbook should acknowledge basis risk and liquidity differences. A hedge is a bridge, not a replica. Specify the instrument, notional size, and conversion factor in advance so math does not happen in a panic.

Write plain-language order templates to use over the phone or mobile. For example: “Close position: buy 2 contracts at market, day” or “Hedge: short 200 shares of ETF X at market.” Templates reduce ambiguity when signal is poor or when a support representative needs clear instructions.

Technical resilience that actually matters

Local connectivity is a frequent single point of failure. A prepaid mobile hotspot or phone tether, tested in advance, can restore connectivity in seconds. A small UPS can keep the router, modem, and monitor alive long enough to act. For traders who rely on desktop platforms, consider a lightweight cloud-hosted instance or a remote machine that can be reached from a phone. Keep security hygiene strict: unique credentials, MFA tokens accessible in an outage, and a written recovery process for 2FA lockouts.

The runbook should describe the order of switching: power check, router reboot or bypass, hotspot activation, platform restart, backup device login, and only then escalation to broker support. The sequence prevents thrashing and lost time.

Communication templates and documentation

When contacting a broker, be specific. Ask whether server-side stops and OCOs are active, whether order acknowledgments are delayed, and whether there is a known incident. Record the timestamp, representative name, and the guidance given. Keep a one-page log with fields for time, symptom, action, and outcome. Documentation is not busywork. It forms the primary dataset for post-incident learning and helps resolve disputes about order state.

A simple communication protocol also covers counterparties. If trading in a prop context or sharing risk with a partner, predefine how and when an outage notice is sent. The notice can be one sentence stating the failure mode, current exposure, and the next action. Brevity reduces misinterpretation while attention is scarce.

Self-regulation during an outage

The best checklist fails if physiology is out of control. A short breathing protocol lowers reactivity without stealing time. Slow nasal breathing at a fixed cadence for 60 to 90 seconds is sufficient to reduce sympathetic arousal for most people. Time-boxing decisions also helps: confirm the label of the failure, act according to the runbook, then do not tinker for a fixed interval unless new information arrives. Evidence from implementation intentions and stress-inoculation work shows that prewritten scripts paired with brief calming techniques improve adherence and reduce error rates.

Avoid social feeds during the event. Rumor increases noise and widens the action space. The runbook is designed to narrow options when it matters most.

A Saturday drill to consolidate the habit

Weekends are ideal for rehearsal when markets are closed. Set a 20-minute drill today. Simulate a platform freeze with a recorded chart or paper scenario. Practice your triage questions out loud. Switch to a hotspot and log in to the backup broker on a phone. Place a small test order in a simulator or with a minimal size if markets are open in another venue. Update the runbook with any friction discovered, such as missing passwords, unclear order templates, or slow device unlocks.

Drills convert a document into a behavior. In training research, even brief, repeated simulations improve reaction time and working memory under pressure. The benefit compounds when paired with debrief notes.

Post-incident journaling and metrics

After any outage, write a short postmortem the same day. Capture a timeline: detection time, the label assigned to the failure, decisions taken, and the time to restore control. Include position details and the difference between the planned and actual outcome. Note any near misses, such as protective orders that were assumed live but were not.

Add simple metrics: mean time to detect, mean time to mitigate, and realized slippage relative to expected. Trends matter more than a single incident. A declining detection time suggests that drills are working. If slippage worsens, the hedge mapping or thresholds may need revision. Keep the tone analytical rather than emotional to reduce hindsight bias.

Integrate the runbook into daily routine

A runbook only helps if it is visible and refreshed. Store a printed copy within reach, with a dated revision on the cover. Add a quick morning check: verify backup internet readiness, confirm where stops are hosted for the day’s instruments, and keep the phone app logged in with MFA tokens available. End the week by reviewing one scenario and adjusting templates or thresholds based on new platform features or broker notices.

Putting it together

A trading outage is a test of design, not just nerves. A concise runbook transforms a chaotic event into a bounded protocol: fast detection, a default risk stance, clear alternate pathways, and disciplined review. The psychological edge comes from removing improvisation at the worst possible moment. Build it once, rehearse briefly, and let the script carry the weight when attention is scarce.