Duplicate MAC Address Madness
Introduction
This is a story about when I had duplicate MAC addresses on a network, and the particular way the chaos ensued.
The Topology
The topology is a common enterprise scenario.
| Site Alpha | Site Beta |
|---|---|
| Edge FWs 1 | Edge FWs 2 |
| Core FWs 1 | Core FWs 2 |
| Core SWs 1 | Core SWs 2 |
Two sites, with identical topology of paired firewalls in HA, connected together at layer-2 via the Core switches.
How the Issue Appeared
It appeared as though traffic could not span sites at layer-2, which I knew to be a logical impossibility. When a gateway was active at Site Alpha everything worked as normal, but when it was active at Site Beta it would no longer function at Site Alpha any more.
Troubleshooting
I ran various pings, and there was a dividing line where things would work or not work. This dividing line seemed to be distance based, maybe on the number of layer-2 hops? At this point I did some show MAC address commands on the switches involved, and everything seemed “ok” at first glance. The MAC address in question was being seen. But in some cases it wasn’t on the port I was expecting, in some cases it would be on the uplink ports to the sites FW stack instead of the intersite link.
First Hop Redundancy
We take a small detour at this point to mention something important to this setup, due to the use of paired firewalls for high availability there is a configured “high availability group ID”. I noticed that the MAC address in question included the HA Group ID, and the lights started to come on really fast. All of the First Hop Redundancy Protocols I’m aware of (VRRP, CARP, etc) use their own distinct MAC address ranges, with an ID number making them unique per segment. However in this case the ID numbers were not unique, they were representative of hierarchy.
Resolution
Changing the Firewalls HA Group ID for one of the sites to move to a unique ID per FW cluster in this setup vs just per cluster/per site resolved the issue instantly.
