No description
Find a file
Eric Schewe 235d2be930 docs: align script header install steps with README (curl + chmod)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:43:39 -07:00
10-wancarp docs: align script header install steps with README (curl + chmod) 2026-06-07 19:43:39 -07:00
LICENSE Updated README with AI Disclosure, added missing repo content 2026-06-07 18:26:36 -07:00
README.md docs: fix stale/inconsistent doc claims in script + README 2026-06-07 19:42:28 -07:00
SECURITY.md Updated README with AI Disclosure, added missing repo content 2026-06-07 18:26:36 -07:00

10-wancarp — OPNsense CARP Helper

When a CARP event occurs on an OPNsense firewall in an HA (High Availability) pair, this script runs and enables or disables interfaces based on the node's current CARP role:

  • MASTER firewall → enables the configured interface(s).
  • BACKUP firewall → disables the configured interface(s).

This keeps the uplink UP on exactly one node at a time. Typical use is a WAN link that must not be active on both firewalls simultaneously (e.g. a single ISP handoff, a modem in bridge mode, or any upstream that gets confused by two devices claiming the same MAC/IP).

You can manage one or many interfaces. The common case is WAN only.

This script only runs on transitions. After you first set up HA, all interfaces are UP. To force a quick event for testing, go to Interfaces → Virtual IPs → Status → Enter Persistent CARP Maintenance Mode on the MASTER (then leave maintenance mode to flip back).

AI Disclosure

  • I used Claude Code (Opus 4.8) to write this script in its entirety including the README.
  • I have reviewed the README for accuracy based on my experience using the script.
  • I have fully tested the script in my Homelab to verify it works as intended.
  • I have only read the script header and not performed a full source review
  • I had Claude Code use Context7 while developing this script so the latest relevant documentation would be used

How it works

OPNsense executes every file in /usr/local/etc/rc.syshook.d/carp/ on a CARP transition and passes two arguments:

Arg Meaning Example
$argv[1] subsystem 1@vtnet1 (<vhid>@<interface>)
$argv[2] type MASTER, BACKUP, or INIT

So the system effectively runs:

/usr/local/etc/rc.syshook.d/carp/10-wancarp 1@vtnet1 MASTER

On MASTER: brings the real device link up and reconfigures it. On BACKUP: resets each interface, forces the link down, and (optionally) re-points the default route. INIT is logged but takes no action.

Runtime-only — does not write config

The script makes interface changes at runtime only; it does not call write_config() and never modifies config.xml. This is deliberate:

  • Persisting the per-node enable flag during a CARP event raced OPNsense's own config writes and could clobber the persistent CARP maintenance-mode flag, stranding a node demoted with no GUI button to recover.
  • With HA config synchronization (XMLRPC) enabled, the synced enable value is authoritative anyway, so per-node persistence is pointless and conflicting.

Because nothing is persisted, the idempotency guard checks the live UP flag (not the config enable flag, which reads 1 on both nodes under config sync). Note status: is not used — on virtio (vtnet) NICs it stays active even when the interface is down; only the UP flag is honest.

Boot behaviour: since state isn't persisted, a rebooting node reaches the correct interface state from the CARP MASTER/BACKUP event fired as it rejoins (no action is taken on INIT/boot itself). Confirmed by hard-rebooting both the MASTER and the BACKUP: each hands off and reclaims cleanly, WAN ending up live on exactly one node.

Prerequisite — interfaces must be enabled in config on BOTH firewalls. Because the script is runtime-only and never persists the enable flag, every alias in $iface_aliases must be enabled in config on both nodes (Interfaces → [WAN] → Enable interface → Save/Apply; config sync carries it to the peer). If an interface is disabled in config, OPNsense's gateway/routing subsystem skips it and won't install its default route even when the script brings the link up — the promoted MASTER then gets a DHCP IP but no default route and no Internet (symptom: WAN can ping its own gateway, but netstat -rn shows no default and external pings say "No route to host").

Configuration

Edit the USER CONFIGURATION block near the top of 10-wancarp:

$iface_aliases       = ['wan'];        // OPNsense interface ALIASES
$lan_vip             = '192.168.0.1';  // default route target on BACKUP (optional)
$remove_backup_route = false;          // set true to repoint default route on BACKUP
  • $iface_aliases — use OPNsense aliases (the internal names like wan, lan, opt1, opt2 — the keys under <interfaces> in config.xml). Do not use device names (igc0, em0) or descriptions; the script resolves the real device automatically via get_real_interface().
    • Single uplink: ['wan']
    • Multiple: ['wan', 'opt2']
  • $remove_backup_route / $lan_vip — leave false for almost everyone. See "Giving the BACKUP node Internet access" below for what this is for and why a failover gateway is the better answer.

Giving the BACKUP node Internet access

When a firewall becomes BACKUP, this script disables its WAN — so that box now has no path to the Internet of its own. That's usually a problem: a firewall still wants outbound access while it's the backup (package updates, DNS, NTP, gateway monitoring, etc.). It needs an alternate default route — typically out through the other firewall (the current MASTER) via the LAN.

There are two ways to provide that, and they solve the same problem:

  • A failover gateway (recommended). Configure an OPNsense gateway group with WAN as Tier 1 and a LAN-side gateway (pointing at the peer/MASTER) as Tier 2, monitored by dpinger. When WAN goes down, OPNsense automatically and dynamically shifts the default route to the LAN gateway, and shifts it back when WAN returns. This is native, monitored, and self-healing. If you have this, leave $remove_backup_route = false and ignore $lan_vip the script does not need to touch routing at all.

  • $remove_backup_route (the blunt fallback). If you have no failover gateway, set $remove_backup_route = true and $lan_vip to a LAN-side gateway (often a shared LAN CARP VIP). On every BACKUP transition the script simply deletes the default route and re-adds it pointing at $lan_vip. There's no monitoring and no automatic restore beyond the next CARP event — it's a static repoint, which is why a failover gateway is preferred.

Do not use both. If you configure a failover gateway and set $remove_backup_route = true, the two will fight over the default route on the BACKUP node. Pick one — the gateway group if you can.

Installation

Run on both firewalls:

  1. SSH into each firewall and run: curl https://git.pickysysadmin.ca/eric/10-wancarp/raw/branch/main/10-wancarp -o /usr/local/etc/rc.syshook.d/carp/10-wancarp
  2. Make it executable: chmod a+x /usr/local/etc/rc.syshook.d/carp/10-wancarp

Dry-run (test before going live)

Pass --dryrun (or -n) to run the script from the CLI without changing anything — no interface_configure/interface_reset, no ifconfig/route. Output is echoed to the terminal as well as the system log.

./10-wancarp --dryrun 1@vtnet1 MASTER
./10-wancarp --dryrun 1@vtnet1 BACKUP

(The flag may go in any position; the <vhid>@<interface> source and the role are still required. A source string without @ only warns in dry-run instead of aborting.)

Before printing what each role would do, dry-run runs a preflight that verifies the things most likely to make a live failover silently fail:

  • the legacy global $config is actually populated (if not, a live run resolves nothing and no-ops),
  • every OPNsense PHP API the script calls exists on this build — including interface_reset(), which has varied between releases,
  • each configured alias resolves to a real device, with its current enable flag and live link state (flags/status),
  • the current default route (when $remove_backup_route is set).

It is safe to run on a live firewall; it makes no changes.

Verifying / troubleshooting

Every log line is prefixed with wancarp, so one grep pulls the whole story of any event:

clog -f /var/log/system/latest.log | grep wancarp

(GUI: System → Log Files → General, filter on wancarp.)

For each event you get:

  • a START banner with the raw arguments (proves the hook fired, and with what),
  • one line per interface action — or a skip line with the reason,
  • explicit errors (return code + captured output) for any failed shell command,
  • explicit errors for an unknown/typo'd interface alias,
  • any PHP exception caught by the top-level handler,
  • an END banner: a normal run reports result=ok|exception with changed=yes|no (dry-run uses would-change=yes|no); a rejected event reports aborted=invalid-type|bad-source changed=no.

If you see a START banner but no END banner, the script died mid-run on an uncaught fatal — check /var/log/php_errors.log (if PHP error logging is on).

Manually simulate an event (testing only — prefer --dryrun for a no-op run):

/usr/local/etc/rc.syshook.d/carp/10-wancarp 1@vtnet1 BACKUP

Tested on

OPNsense 26.1.x (PHP 8), validated on a live two-node HA pair (virtio NICs, DHCP WAN, config sync enabled): maintenance-mode and hard-reboot failover in both directions, WAN moving with its IP + default route, idempotent across the duplicate per-VIP CARP events. Uses only stable OPNsense PHP APIs (get_real_interface, legacy_interface_flags, interface_configure, interface_reset, log_msg / log_error).

HA note: CARP-aware services

This script handles the WAN, but it's worth auditing anything else that runs on both nodes. A multicast reflector running on both firewalls at once is a classic trap — e.g. the mDNS Repeater (Avahi) with two nodes reflecting between the same VLANs forms a reflection loop (224.0.0.251 flood) that looks like a CARP/L2 storm but isn't. Enable its "Enable CARP Failover" option (or otherwise make it run on the MASTER only). Rule of thumb: any service that should be active on just one node must be CARP-aware or single-node.

References

Original script concept — kronenpj on GitHub