Dual SIM and WAN failover in industrial routers – when they truly save connectivity

- Why Dual SIM in an industrial router
- What WAN failover is and how it works
- Failback and health-check – the details that matter
- When it’s essential vs “nice to have”
- How to choose carriers and SIMs for real redundancy
- Common deployment mistakes
- Testing and monitoring: how to verify failover works
- FAQ
- Summary
In automation and BMS, the problem is rarely “does LTE/5G work,” but does it work predictably. Coverage can fluctuate, base stations can hand off unexpectedly, and carriers introduce policies that look like random outages from the outside. That’s why two concepts keep appearing in OT projects: Dual SIM and WAN failover.
This article answers one practical question: when does link redundancy truly save a project, and when is it just a “nice feature on a datasheet.” We’ll focus on the mechanics, real failure scenarios, and common configuration mistakes that make failover exist “on paper,” but fail in a real incident.
Why Dual SIM in an industrial router
Dual SIM in an industrial router is the simplest way to reduce carrier-related risk. In an ideal world, LTE/5G outages don’t exist. In the real world you get: BTS congestion, network upgrades, backhaul failures, “coverage holes” at certain times and locations, and sometimes degraded signal quality due to environmental changes (e.g., new buildings).
A second SIM from a different carrier works like insurance: when one network has trouble, the router switches to the other. In practice, the key point is that redundancy should not be “two SIMs in the same network”, but genuinely independent paths: different carriers, different radio infrastructure, and often different backhaul routes.
What WAN failover is and how it works
WAN failover is the logic that shifts traffic to a backup link when the primary link no longer meets operating conditions. Important: “the link is down” does not always mean no coverage. In automation, you more often see cases where the modem is attached to the network, but IP traffic does not pass correctly (routing issues, congestion, carrier core problems, DNS failures, APN issues).
Proper failover is based on measuring link health rather than “do we have signal bars.” The router runs periodic checks (health-check) and decides whether to stay on the primary link or switch to the backup. Depending on the design, the backup can be: the second SIM, a second modem, a second carrier, or even a wired WAN (Ethernet).
Failback and health-check – the details that matter
Two things most commonly break failover in practice: bad health-check design and no sensible failback. Health-check must detect a real IP connectivity loss, not just “interface up/down.” Failback is the logic that returns to the primary link once it is stable again – with hysteresis, so the router doesn’t “bounce” between links every minute.
- Health-check: test more than one endpoint (e.g., multiple hosts) and don’t rely on DNS alone.
- Thresholds and timers: set sensible delays to avoid switching on short fluctuations.
- Failback: return to the primary link only after a stability window, not “immediately after the first successful check”.
- VPN: verify that after a path change the tunnel re-establishes automatically and routing policies remain consistent.
When it’s essential vs “nice to have”
Dual SIM and WAN failover are essential where loss of connectivity has real costs: downtime, missing alarms, missing billing/settlement data, risk of process parameters drifting out of range, or the need for an immediate service trip. They’re “nice to have” when connectivity is only auxiliary and an outage doesn’t create time pressure.
How to choose carriers and SIMs for real redundancy
Redundancy only makes sense when paths are truly independent. In practice that means: two different carriers, and ideally validating their behavior at the specific location (different times of day, different conditions). If the site is radio-challenging, an external antenna and proper cable routing can be a better investment than “another SIM card.”
- Choose SIMs with stability in mind, not only data volume.
- Watch out for CGNAT limitations and remote-access requirements (VPN).
- If possible, test link behavior under load and during peak hours.
Common deployment mistakes that make failover fail
- “Dry” testing – switching works, but VPN, routing, and OT services weren’t verified after the switch.
- DNS-only health-check – when DNS hiccups, failover switches unnecessarily, or misses a real IP issue.
- Overly aggressive timers – the router bounces between links during short fluctuations.
- No failback or no hysteresis – no return to primary, or “ping-pong” between WAN links.
- Two SIMs with one carrier – “redundancy” without real independence.
- Ignoring antennas – router inside a metal cabinet without a proper external antenna.
Testing and monitoring: how to verify failover really works
The simplest test is “pull the SIM and see.” That’s a good start, but it’s not enough. Real outages don’t look like a removed SIM. More often you get: LTE is attached, but traffic is congested or VPN loses stability. That’s why testing should include:
- Health-check validation – does it detect IP problems, not only interface status.
- VPN tunnel test – does the tunnel come back automatically and keep correct routes after switching.
- OT service test – after switching, do PLC/HMI/SCADA paths work, not just “internet access.”
- Monitoring – switch logs, link-quality alerts, event history.
FAQ – Dual SIM and WAN failover in practice
Summary: link redundancy is a process, not a checkbox in a datasheet
Dual SIM and WAN failover can save a project, but only when they’re designed and tested end-to-end: health-check detects real IP issues, failback has hysteresis, and VPN plus routing are ready for a path change. In critical sites, this is one of the most cost-effective ways to increase availability and reduce service trips.
In the next article we’ll go one step further: “Industrial router – how to choose (LTE/5G, VPN, Dual SIM, failover)” and break selection down into practical criteria that matter in automation.
