Data Center Multicast
Multicast is an efficient method of sending the same traffic to multiple hosts. Traffic from a source can be delivered to a distributed group of interested listeners, allowing a single traffic flow to reach multiple recipients. HPE Aruba Networking supports IPv4 and IPv6 multicast; however, this document focuses on IPv4.
Table of contents
Overview
HPE Aruba Networking data centers support multicast traffic between hosts within a data center and between data center and external hosts. Protocol Independent Mode–Sparse Mode (PIM-SM), PIM–Bidirectional (PIM-BIDIR), Source-Specific Multicast (SSM), and the Internet Group Management Protocol (IGMP) support multicast routing in EVPN-VXLAN and traditional Two-Tier data center architectures.
Multicast traffic is identified as having a destination IP address in the 224.0.0.0/4 IPv4 address block, and it is forwarded between routers in a network based on a dedicated multicast routing table. Each unique address in the reserved block is referred to as a multicast group. Both unicast and multicast traffic use the same IP headers, but routing table maintenance and traffic forwarding decisions have key differences.
A unicast routing table associates unicast IP prefixes with interfaces on the router. When a unicast packet is received by a router, it performs a lookup in the local forwarding table to find the longest prefix match for the destination address in the IP header. An exact prefix match for the destination address is not required, and is likely not present, except in the case of EVPN-VXLAN overlays that share host routes. If the destination IP address does not fall within the range of a prefix installed in the forwarding table, the packet is dropped. A default route that matches all addresses can be assigned to provide a default forwarding path for traffic that does not match a more specific prefix. Only the interface associated with the longest prefix match is selected to forward the received packet. When more than one interface is associated with the same longest prefix, the router selects only one of the interfaces to forward the packet based on a load balancing algorithm.
A multicast routing table associates IP multicast groups with one or more outgoing router interfaces. When a multicast packet is received by a router, it performs a lookup for an exact match of the destination IP address, which is the multicast group. If an exact match is not found, the packet is dropped. If a match is found, the packet is forwarded on all interfaces in a list associated with the multicast group.
Multicast enables sending a single data flow to multiple recipients along multiple network paths. The sender does not need to maintain a traffic flow for each interested listener, but can simply send a single traffic stream with a multicast group destination IP, and all interested hosts can receive the same data. This reduces the CPU burden on the host sourcing data and minimizes traffic utilization in the network for applications that can take advantage of this delivery method. Multicast traffic is commonly used for service discovery, host imaging, telephony, and video applications.
Multicast Components
Multicast routers require a method to efficiently establish forwarding state between sources and interested receivers. Typically, only a small number of hosts in a network are interested in receiving traffic for a multicast group, and those receivers are not aware of source IP addresses.
Similar to building unicast routing tables with protocols like OSPF and BGP, multicast routing combines several control plane protocols and strategies to ensure that multicast flows are delivered to all interested receivers, while only consuming capacity on appropriate links and minimizing router resource consumption.
PIM-SM
PIM-SM is the recommended multicast routing protocol in an HPE Aruba Networking data center, and it is the primary protocol used to build multicast route tables.
In general, PIM-SM routing is scoped to a PIM-SM domain, which is roughly defined as a contiguous set of PIM-SM speaking routers that agree which member of the domain performs the router rendezvous point (RP) function. A PIM domain typically includes the full set of campus and data center switches performing routing functions. PIM-SM operates in both traditional and overlay network architectures.
PIM-SM maintains neighbor adjacencies with directly connected PIM peers. PIM speakers primarily add and remove multicast route table entries after receiving PIM join and prune messages from their neighbors, which communicate interest in receiving traffic for a specific multicast group. Multicast route entries contain two main components for a given multicast group: an incoming interface (IIF) and an outgoing interface list (OIL). When a multicast packet is received on a PIM router, the packet is replicated and forwarded on all interfaces included in the OIL for the group.
PIM routers directly connected to Layer 2 LAN and VLAN segments that contain sources and receivers are referred to as designated routers (DRs).
Rendezvous Point (RP)
In a PIM-SM network, hosts interested in receiving traffic for a multicast group are generally not aware of source IP addresses, so this information is maintained on behalf of the receivers using network infrastructure. One or more PIM-SM routers are assigned the RP role and manage the mapping of multicast groups to their source IP addresses. It is best practice to assign a router loopback IP for the RP function to ensure that the interface is always up.
PIM-SM only permits the assignment of a single IP address to function as the RP for a given range of multicast groups. Typically, the full range of multicast IP addresses (224.0.0.0/4) is assigned to one RP address. In order to provide RP redundancy, an anycast strategy is used, where the same RP IP address is configured on two different physical switches.
Each source for a multicast group is registered with the RP by the source’s DR, typically the source’s IP gateway, using a unicast PIM register message. The RP sends a Register-Stop message after the information has been added the to RP’s source address table. Periodic null register messages from the source’s DR maintain state for the multicast group’s source address at the RP.
When using an anycast RP for redundancy, it is important to note that unicast PIM register messages are delivered to only one of the anycast RPs. Both switches require a complete set of source addresses to ensure a fully functional environment. The Multicast Source Discovery Protocol (MSDP) solves this problem by sharing multicast group source address entries between the two anycast RPs, which ensures that both switches have a complete database of multicast group to source IPs.
Bootstrap Router (BSR)
All the PIM-SM routers in a PIM domain must be configured with the RP’s IP address manually or using an automation method.
The Bootstrap Router (BSR) mechanism, built into the PIM-SM protocol, provides a dynamic method of selecting the RP and distributing the RP’s address throughout the PIM domain. The BSR process reduces the administrative burden of configuring RP information manually on all CX switches that perform multicast routing.
The BSR process occurs in two phases. First, an election selects one of the PIM-SM routers configured as a candidate BSR to serve as the BSR for the PIM domain. The identity of the selected BSR is distributed throughout the PIM domain. Second, PIM-SM routers configured as candidate RPs advertise a specific interface IP to the BSR as a possible RP, typically a loopback IP. The BSR selects one of the candidate RPs as the active RP for the domain, and then informs all other members in the PIM domain of the selected RP’s address. Candidate RPs should be assigned priority values to influence the RP selection process.
When using an anycast RP, two candidate RPs advertise the same IP loopback address with the same priority value. It is best practice to configure the same two PIM-SM routers as both candidate BSRs and candidate RPs, where both advertise the same candidate RP address and priority value. This minimizes configuration, provides RP redundancy, and simplifies troubleshooting.
Internet Group Management Protocol (IGMP)
IGMP identifies the hosts on a Local LAN or VLAN segment interested in receiving multicast traffic using two methods.
When a host is interested in receiving traffic from a multicast group, it informs IGMP speaking PIM-SM routers of its interest by sending an unsolicited IGMP membership report to a well-known multicast address. The PIM DR for the local network starts building multicast state that allows source traffic to reach the interested listener as described in the following Routed Multicast Flow Setup section.
One or more PIM-SM routers are configured as IGMP queriers for a local LAN or VLAN segment. One of the configured IGMP routers is selected to operate as the active IGMP querier for the local network. The IGMP querier periodically sends solicitations for interest in multicast groups with current state on the router and a general query for all groups. Hosts interested in receiving multicast traffic respond with an IGMP membership report.
In an HPE Aruba Networking data center, the VLAN SVIs of the VSX switches providing IP gateway services are configured as IGMP queriers.
IGMPv3 is the recommended protocol version and is used by default on AOS-CX switches.
IGMP Snooping
IGMP Snooping is enabled on Layer 2 switches to monitor IGMP communications between hosts and multicast routers. By listening to IGMP messages, switches discover local ports with downstream receivers interested in specific multicast groups. Each IP multicast group has a corresponding MAC address. Based on IGMP snooping data, the switch installs multicast group MAC addresses into the Layer 2 forwarding table. This ensures that multicast traffic is forwarded only to hosts interested in receiving traffic for a multicast group on only the necessary Layer 2 links required to reach the receiver, rather than flooding the traffic to all ports in same VLAN.
Note: There is not a one-to-one mapping of MAC addresses to IP multicast groups. The last 23 bits of a multicast group’s IP address, which correspond to the range 00:00:00–7F:FF:FF, are appended to the 01:00:5E Organization Unit Identifier (OUI) to create a MAC address that represents the group and can be installed in the Layer 2 forwarding table. This strategy results in MAC address oversubscription, where each MAC address potentially represents 32 IP multicast groups. This should be considered when selecting multicast group addresses for applications to ensure Layer 2 forwarding optimization.
IGMP snooping configuration is recommended on all AOS-CX switches performing Layer 2 operations, when using multicast. This enhances network performance by not consuming capacity on links without downstream receivers. Downstream hosts not interested in the traffic do not need to process the packets.
Routed Multicast Flow Setup
The network path between multicast receivers and sources is referred to as a distribution tree, where the receivers are the leaves of the tree, PIM router links are branches, and the root of the tree is the source.
Multicast route tables specify two types of entries: (*,G) and (S,G). For both types of entries, the “G” represents the multicast group. In a (*,G) route entry, the “*” represents all possible sources for a specific multicast group. In an (S,G) entry, the “S” represents an individual source address.
Note: (*,G) is pronounced “star comma gee” and (S,G) is pronounced “ess comma gee.”
Both multicast route types help build the multicast forwarding table and specify two components per entry: an outgoing interface list (OIL) and an incoming interface (IIF). When routing state exists for a group, multicast traffic received from an upstream source is forwarded downstream toward all interested receivers on every interface included in the combined (*,G) and (S,G) OILs.
The receiver’s DR is responsible for starting the process of building distribution trees and routing state.
Rendezvous Point Tree
When the first interested listener for a multicast group sends an IGMP join for a multicast group, the receiver’s DR has no knowledge of the group’s sources. Since the RP is aware of all multicast sources in a PIM domain, the receiver’s DR begins building a distribution tree toward the RP. This path is referred to as the rendezvous point tree (RPT).
The receiver’s DR adds the interface for the receiver’s network segment to the outgoing interface list (OIL) in a (*,G) route entry. The DR then sends a (*,G) PIM join toward the RP based on reverse path forwarding (RPF), which determines the interface with the closest unicast routing distance to the RP’s IP address. The same interface on which the (*,G) join was sent is added as the incoming interface of the (*,G) route entry.
The upstream PIM-SM neighbor receiving the (*,G) PIM join creates a (*,G) route entry and adds the interface on which the join was received to the OIL for the multicast group. It then sends a (*,G) join toward the RP using RPF to select the nearest interface, again adding the interface as the IIF for the (*,G) route entry. This process is repeated until the RP receives the (*,G) PIM join.
Shortest-Path Tree (SPT)
The RP contains a list of all known sources in the PIM domain, so it is used to facilitate the initial traffic flow between sources and interested receivers over a routed network.
When an RP receives a (*,G) PIM join from a PIM neighbor, it creates a (*,G) route entry for the group and adds the interface for the PIM join to the OIL. It then consults its list of known sources for the requested multicast group. For each known source, the RP builds a shortest-path tree (SPT).
The following process is repeated for each source address known by the RP for the group. The RP adds an (S,G) route entry. The RPF process consults the unicast routing table to determine the interface with the shortest routed distance towards a source. This interface is added as the IIF for the (S,G) route entry, and a PIM (S,G) join is sent to the PIM neighbor on that interface toward the source. The upstream PIM-SM neighbor receiving the (S,G) PIM join creates an (S,G) route entry and adds the interface it was received on to the OIL for the multicast group, and it then sends an (S,G) join toward the source based on RPF, adding this interface as the IIF for the (S,G) route entry. This process is repeated until the DR for the multicast source receives the (S,G) PIM join.
The following diagram provides a simple example of establishing the initial RPT, and the SPT from the RP to the source, with only a single router in the path for each. After distribution trees are built, traffic flows in the reverse direction of building the PIM multicast routing state.
Routed Multicast Path Optimization
After the SPT is built from the RP to the source, multicast traffic can flow from the source to the interested receiver. Multicast traffic is forwarded from the source to the RP using the SPT. The RP then forwards multicast group traffic toward interested listeners using the RPT.
However, the combined distribution trees through the RP may not be the shortest path between the source and receiver. After multicast traffic from a source arrives at the receiver’s DR, a source IP is known for building a more optimized path. The receiver’s DR adds an (S,G) route entry for each source observed for a multicast group. It then initiates building an SPT toward an individual source using the same process described above. After the receiver’s DR observes traffic on the SPT, the source address is pruned from the RPT by sending a special PIM prune message to the RP that removes only the specified source from the RPT. This permits additional sources to come online and deliver traffic to multicast group receivers using the RPT. After the RP receives the PIM prune message on the RPT, it may prune the SPT for the same source.
Building the SPT from the interested receiver’s DR to the source’s DR ensures optimal use of network capacity and reduces resource consumption on the RP.
The following illustration depicts optimized traffic flow after cutover to the SPT from the receiver’s DR to the source, and the source has been pruned from both the RPT and the SPT path from the RP to the source.
VSX Considerations
When implementing PIM-SM and IGMP functions on a VSX pair, each switch operates as different routers that share the same LAN segments.
By default, only one member of the VSX pair functions as the PIM DR for downstream hosts. The DR is responsible for building the RPTs and SPTs for multicast traffic flows. When timely multicast recovery is required following a VSX member failure, a VSX pair can enable PIM active-active mode. In this mode, one member of the pair is designated the DR and the second member is designated the proxy-DR, which is in a backup role. Both the DR and proxy-DR initiate building RPTs and SPTs, but only the DR forwards traffic to interested receivers. This allows for fast recovery in case of DR failure, since multicast traffic is already streaming to the proxy DR. As soon as a DR failure is detected, the proxy DR can begin forwarding multicast traffic without the delay of building new multicast routing state. When implementing a Two-Tier topology, the data center core switches should be configured with active-active PIM.
When using an EVPN-VXLAN overlay with Integrated Routing and Bridging (IRB), host-facing SVI interfaces are configured to operate with the behavior of a DR on both VSX members, irrespective of DR role assignment, by assigning the ip pim-sparse vsx-virtual-neighbor command on the interface. IGMP and PIM joins are processed in the same manner by both VSX members, and both members actively forward received multicast traffic to downstream hosts. In an EVPN-VXLAN environment, both VSX members are assigned the same logical VTEP address. Unlike the active-active configuration above, only one member of the VSX pair receives any individual packet as part of a multicast stream from a remote VTEP in the fabric, based on the load balancing algorithm of the underlay network. Therefore, both VSX members must actively forward multicast traffic to ensure delivery to the receiver, and duplicate delivery of of multicast packets to downstream hosts does not occur in this design.
Two-Tier Multicast Operation
In a Two-Tier data center, the core layer switches operate as PIM-SM routers. They support multicast routing between data center hosts and between the data center and external networks. Typically, the data center core switches learn the campus RPs using the BSR mechanism. Active-active PIM provides fast failover of multicast traffic forwarding in case of a core switch failure. The core switch selected as the PIM DR reports directly-connected multicast sources to the campus RP. The core switches also learn about multicast receivers using IGMP.
IGMP snooping optimizes Layer 2 multicast forwarding on both core and access layer switches, so that core switches only forward traffic to downstream access switches with interested receivers, and access switches only forward traffic to ports with directly attached receivers.
The diagram below identifies the multicast features enabled for different roles related to the Two-Tier data center.
RP Placement
When a data center is attached to a campus network, the campus RP can be used by data center switches and learned via the BSR mechanism.
When an external RP is unavailable, an anycast RP can be configured on the Two-Tier routed core to support routed multicast.
EVPN-VXLAN Multicast Operations
HPE Aruba Networking’s EVPN-VXLAN implementation enables efficient routing of multicast traffic for single or multiple data center fabrics using PIM-SM. IPv4 multicast forwarding for both Layer 2 and Layer 3 is supported, and additional IGMP and PIM optimizations are implemented to accommodate the overlay network topology.
The diagram below identifies the multicast features enabled for different roles in the EVPN-VXLAN data center. PIM is enabled per overlay VRF and configured on overlay interfaces. IGMP is configured on host-facing overlay interfaces. A separate PIM adjacency is formed between the border leaf switches and firewalls for each overlay VRF.
Note: When hosts are positioned at the border leaf, IGMP and IGMP snooping are also required on the border leaf switches.
Native support for Layer 2 and Layer 3 multicast protocols over EVPN-VXLAN tunnels simplifies multicast overlay configuration and troubleshooting. IGMP and PIM-SM optimize IP multicast forwarding, constraining multicast group traffic only to VTEPs with interested listeners. This reduces network traffic and possible congestion.
VXLAN Bridged Multicast
In a VXLAN overlay, Layer 2 multicast traffic is bridged logically between sources and receivers in the same Layer 2 VNI (VLAN) across VTEPs.
By default, when a VTEP receives multicast traffic from an attached source, it replicates and forwards the traffic to all other VTEPs configured with the same Layer 2 VNI. Remote VTEPs receiving the multicast traffic flood it out all ports configured for the VLAN associated with the Layer 2 VNI. All fabric hosts in the same VLAN receive the multicast traffic, irrespective of their interest in the multicast group.
IGMP snooping optimizes Layer 2 multicast forwarding in a VXLAN overlay. When enabled, VTEPs forward IGMP joins and leave messages to peer VTEPs configured with the same L2 VNI on which the IGMP message was received. Each VTEP updates its local Layer 2 multicast forwarding table based on the shared IGMP messages. This enables the source-connected VTEP to forward multicast traffic only to VTEPs with receivers interested in a specific multicast group.
The diagram below illustrates multicast forwarding with IGMP snooping optimizations. One host sends an IGMP join to express interest for a multicast group. The IGMP join is forwarded to all other VTEPs. The source attached VTEP forwards multicast traffic only to VTEPs with an interested receiver. The receiver’s VTEP forwards traffic only to the individual host that made the IGMP join.
IGMP Querier Positioning
In an HPE Aruba Networking VXLAN overlay, IGMP joins, leaves, queries, and responses are flooded to all VTEPs in a fabric. When using symmetric IRB for unicast traffic, IGMP queriers are configured on all host-facing VLAN SVIs.
When using a centralized gateway, the IGMP querier should be configured on the centralized gateway and the centralized gateway provides routed multicast functions.
VXLAN Routed Multicast
An IRB deployment with IP gateways local to each VTEP provides better scalability than a centralized gateway. In this model, each leaf switch establishes a PIM adjacency with each remote VTEP in the fabric using a virtual interface associated with the Layer 3 VNI. The result is a full mesh of PIM adjacencies in the data center fabric for each overlay VRF.
When multicast senders and receivers are in different subnets in a VXLAN overlay, Layer 3 multicast traffic is routed between subnets in the same VRF. When the receiver’s Layer 2 VNI is present on the source VTEP, AOS-CX multicast uses asymmetric IRB by routing the traffic locally at the source VTEP and sending the traffic to the receiver’s VTEP using the appropriate Layer 2 VNI.
The following diagram illustrates routing multicast traffic from a source in VLAN 30 to a host in VLAN 20 in a VXLAN overlay. Layer 2 IGMP snooping optimizations (not depicted) still constrain forwarding at the target VTEP only to hosts that expressed interest in the multicast group.
When a multicast source begins sending traffic for a multicast group in an EVPN-VXLAN environment, its locally attached VTEP registers the multicast source’s unicast IP address for the multicast group with the PIM-SM domain’s RP. When unicast routing uses symmetric IRB, the VLAN SVI IP address connected to the source is not unique, so a unique loopback IP in the VTEP’s overlay VRF is configured to originate the PIM Register message. When the RP is located external to the EVPN-VXLAN fabric, a route must be advertised to the external network that allows RP communication back to these loopback addresses, so the RP can send PIM Register-Stop messages and build an SPT to the source’s DR.
RP Placement
When a data center is attached to a campus network, the campus RP can used by EVPN-VXLAN leaf switches and learned via the BSR mechanism.
When an external RP is unavailable, an anycast RP can be configured on a redundant pair of leaf switches, typically the border leaf. When configuring the RP on a VSX leaf pair, it is common to configure an RP per overlay VRF.