- PHP 100%
|
|
||
|---|---|---|
| 10-wancarp | ||
| LICENSE | ||
| README.md | ||
| SECURITY.md | ||
10-wancarp — OPNsense CARP Helper
When a CARP event occurs on an OPNsense firewall in an HA (High Availability) pair, this script runs and enables or disables interfaces based on the node's current CARP role:
- MASTER firewall → enables the configured interface(s).
- BACKUP firewall → disables the configured interface(s).
This keeps the uplink UP on exactly one node at a time. Typical use is a WAN link that must not be active on both firewalls simultaneously (e.g. a single ISP handoff, a modem in bridge mode, or any upstream that gets confused by two devices claiming the same MAC/IP).
You can manage one or many interfaces. The common case is WAN only.
This script only runs on transitions. After you first set up HA, all interfaces are UP. To force a quick event for testing, go to Interfaces → Virtual IPs → Status → Enter Persistent CARP Maintenance Mode on the MASTER (then leave maintenance mode to flip back).
AI Disclosure
- I used Claude Code (Opus 4.8) to write this script in its entirety including the README.
- I have reviewed the README for accuracy based on my experience using the script.
- I have fully tested the script in my Homelab to verify it works as intended.
- I have only read the script header and not performed a full source review
- I had Claude Code use Context7 while developing this script so the latest relevant documentation would be used
How it works
OPNsense executes every file in /usr/local/etc/rc.syshook.d/carp/ on a CARP
transition and passes two arguments:
| Arg | Meaning | Example |
|---|---|---|
$argv[1] |
subsystem | 1@vtnet1 (<vhid>@<interface>) |
$argv[2] |
type | MASTER, BACKUP, or INIT |
So the system effectively runs:
/usr/local/etc/rc.syshook.d/carp/10-wancarp 1@vtnet1 MASTER
On MASTER: brings the real device link up and reconfigures it. On
BACKUP: resets each interface, forces the link down, and (optionally)
re-points the default route. INIT is logged but takes no action.
Runtime-only — does not write config
The script makes interface changes at runtime only; it does not call
write_config() and never modifies config.xml. This is deliberate:
- Persisting the per-node
enableflag during a CARP event raced OPNsense's own config writes and could clobber the persistent CARP maintenance-mode flag, stranding a node demoted with no GUI button to recover. - With HA config synchronization (XMLRPC) enabled, the synced
enablevalue is authoritative anyway, so per-node persistence is pointless and conflicting.
Because nothing is persisted, the idempotency guard checks the live UP
flag (not the config enable flag, which reads 1 on both nodes under config
sync). Note status: is not used — on virtio (vtnet) NICs it stays
active even when the interface is down; only the UP flag is honest.
Boot behaviour: since state isn't persisted, a rebooting node reaches the correct interface state from the CARP
MASTER/BACKUPevent fired as it rejoins (no action is taken onINIT/boot itself). Confirmed by hard-rebooting both the MASTER and the BACKUP: each hands off and reclaims cleanly, WAN ending up live on exactly one node.
Prerequisite — interfaces must be enabled in config on BOTH firewalls. Because the script is runtime-only and never persists the
enableflag, every alias in$iface_aliasesmust be enabled in config on both nodes (Interfaces → [WAN] → Enable interface → Save/Apply; config sync carries it to the peer). If an interface is disabled in config, OPNsense's gateway/routing subsystem skips it and won't install its default route even when the script brings the link up — the promoted MASTER then gets a DHCP IP but no default route and no Internet (symptom: WAN can ping its own gateway, butnetstat -rnshows nodefaultand external pings say "No route to host").
Configuration
Edit the USER CONFIGURATION block near the top of 10-wancarp:
$iface_aliases = ['wan']; // OPNsense interface ALIASES
$lan_vip = '192.168.0.1'; // default route target on BACKUP (optional)
$remove_backup_route = false; // set true to repoint default route on BACKUP
$iface_aliases— use OPNsense aliases (the internal names likewan,lan,opt1,opt2— the keys under<interfaces>inconfig.xml). Do not use device names (igc0,em0) or descriptions; the script resolves the real device automatically viaget_real_interface().- Single uplink:
['wan'] - Multiple:
['wan', 'opt2']
- Single uplink:
$remove_backup_route/$lan_vip— leavefalsefor almost everyone. See "Giving the BACKUP node Internet access" below for what this is for and why a failover gateway is the better answer.
Giving the BACKUP node Internet access
When a firewall becomes BACKUP, this script disables its WAN — so that box now has no path to the Internet of its own. That's usually a problem: a firewall still wants outbound access while it's the backup (package updates, DNS, NTP, gateway monitoring, etc.). It needs an alternate default route — typically out through the other firewall (the current MASTER) via the LAN.
There are two ways to provide that, and they solve the same problem:
-
A failover gateway (recommended). Configure an OPNsense gateway group with WAN as Tier 1 and a LAN-side gateway (pointing at the peer/MASTER) as Tier 2, monitored by dpinger. When WAN goes down, OPNsense automatically and dynamically shifts the default route to the LAN gateway, and shifts it back when WAN returns. This is native, monitored, and self-healing. If you have this, leave
$remove_backup_route = falseand ignore$lan_vip— the script does not need to touch routing at all. -
$remove_backup_route(the blunt fallback). If you have no failover gateway, set$remove_backup_route = trueand$lan_vipto a LAN-side gateway (often a shared LAN CARP VIP). On every BACKUP transition the script simply deletes the default route and re-adds it pointing at$lan_vip. There's no monitoring and no automatic restore beyond the next CARP event — it's a static repoint, which is why a failover gateway is preferred.
Do not use both. If you configure a failover gateway and set
$remove_backup_route = true, the two will fight over the default route on the BACKUP node. Pick one — the gateway group if you can.
Installation
Run on both firewalls:
- SSH into each firewall and run:
curl https://git.pickysysadmin.ca/eric/10-wancarp/raw/branch/main/10-wancarp -o /usr/local/etc/rc.syshook.d/carp/10-wancarp - Make it executable:
chmod a+x /usr/local/etc/rc.syshook.d/carp/10-wancarp
Dry-run (test before going live)
Pass --dryrun (or -n) to run the script from the CLI without changing
anything — no interface_configure/interface_reset, no ifconfig/route.
Output is echoed to the terminal as well as the system log.
./10-wancarp --dryrun 1@vtnet1 MASTER
./10-wancarp --dryrun 1@vtnet1 BACKUP
(The flag may go in any position; the <vhid>@<interface> source and the role
are still required. A source string without @ only warns in dry-run instead
of aborting.)
Before printing what each role would do, dry-run runs a preflight that verifies the things most likely to make a live failover silently fail:
- the legacy global
$configis actually populated (if not, a live run resolves nothing and no-ops), - every OPNsense PHP API the script calls exists on this build — including
interface_reset(), which has varied between releases, - each configured alias resolves to a real device, with its current enable flag
and live link state (
flags/status), - the current default route (when
$remove_backup_routeis set).
It is safe to run on a live firewall; it makes no changes.
Verifying / troubleshooting
Every log line is prefixed with wancarp, so one grep pulls the whole story of
any event:
clog -f /var/log/system/latest.log | grep wancarp
(GUI: System → Log Files → General, filter on wancarp.)
For each event you get:
- a START banner with the raw arguments (proves the hook fired, and with what),
- one line per interface action — or a skip line with the reason,
- explicit errors (return code + captured output) for any failed shell command,
- explicit errors for an unknown/typo'd interface alias,
- any PHP exception caught by the top-level handler,
- an END banner: a normal run reports
result=ok|exceptionwithchanged=yes|no(dry-run useswould-change=yes|no); a rejected event reportsaborted=invalid-type|bad-source changed=no.
If you see a START banner but no END banner, the script died mid-run on an uncaught fatal — check
/var/log/php_errors.log(if PHP error logging is on).
Manually simulate an event (testing only — prefer --dryrun for a no-op run):
/usr/local/etc/rc.syshook.d/carp/10-wancarp 1@vtnet1 BACKUP
Tested on
OPNsense 26.1.x (PHP 8), validated on a live two-node HA pair (virtio NICs,
DHCP WAN, config sync enabled): maintenance-mode and hard-reboot failover in
both directions, WAN moving with its IP + default route, idempotent across the
duplicate per-VIP CARP events. Uses only stable OPNsense PHP APIs
(get_real_interface, legacy_interface_flags, interface_configure,
interface_reset, log_msg / log_error).
HA note: CARP-aware services
This script handles the WAN, but it's worth auditing anything else that runs
on both nodes. A multicast reflector running on both firewalls at once is a
classic trap — e.g. the mDNS Repeater (Avahi) with two nodes reflecting
between the same VLANs forms a reflection loop (224.0.0.251 flood) that looks
like a CARP/L2 storm but isn't. Enable its "Enable CARP Failover" option (or
otherwise make it run on the MASTER only). Rule of thumb: any service that
should be active on just one node must be CARP-aware or single-node.