Link Search Menu Expand Document
calendar_month 27-Jan-25

Data Center Connectivity Design

The HPE Aruba Networking data center provides flexible and highly reliable network designs to ensure efficient, reliable access to applications and data for all authorized users while simplifying operations and accelerating service delivery.

HPE Aruba Networking data centers are built on the following switch models:

  • CX 8xxx Ethernet switches
  • CX 9300 Ethernet switches
  • CX 10000 Ethernet switches with Pensando
  • CX 63xx Ethernet switches for out-of-band (OOB) network management.
Table of contents

Data Center Topologies

HPE Aruba Networking data centers support centralized and distributed workloads anywhere within an organization. Each design supports host uplink bundling to provide high throughput and resiliency for mission-critical workloads. Layer 2 domains can be deployed flexibly to meet application requirements and provide virtual host mobility.

CX switches provide a robust platform for Layer 3 services in the data center. HPE Aruba Networking data center designs are primarily implemented using CX 8xxx, CX 9300, and CX 10000 series Ethernet switches that provide low latency and high bandwidth on a fault-tolerant platform designed to carry data center traffic in a 1U form-factor.

**Aruba Data Center Designs**

Spine-and-Leaf with VXLAN Fabric Overview

The most modern and flexible data center design is an EVPN-VXLAN overlay built on a spine-and-leaf routed underlay, which benefits enterprises with growing on-premises workloads and workloads spread across multiple data centers.

The spine-and-leaf underlay design ensures high reliability and horizontal scaling using redundant Layer 3 links between leaf and spine switches. This Clos-based topology provides equal-cost multipath (ECMP) routing to load balance traffic and support fast failover if a link or switch goes down. The fully meshed architecture enables capacity growth simply by adding another spine switch as needed.

An EVPN-VXLAN overlay allows ubiquitous Layer 2 adjacency across the entire data center using VXLAN tunneling. This enables customers to modernize their network while preserving legacy service requirements by connecting physically dispersed Layer 2 segments in the overlay. Physically distant data centers also can extend both Layer 2 and Layer 3 segments logically.

EVPN-VXLAN natively enables segmenting groups of resources within the data center to support multi-tenancy and separation of hosts by role such as production, development, tenant, and those requiring strict regulatory compliance.

Two-Tier Overview

Enterprises with significant, existing on-premises workloads in a single location can benefit from a two-tier data center design. The two-tier approach ensures sufficient capacity and reliability using standards-based protocols such as Link Aggregation Control Protocol (LACP), Spanning-Tree Protocol (STP), and Open Shortest Path First (OSPF). Hosts are dual-homed to two top-of-rack (ToR) switches using a Virtual Switch Extension (VSX) multi-chassis link aggregation group (MC-LAG). Each ToR switch is dual-homed to a data center core using Layer 2 VSX/MC-LAG links. Using Layer 2 between the core and server access layers supports VLAN ubiquity, and loops are prevented primarily using LACP-based MC-LAGs. The core provides Layer 3 services to data center hosts and routing to external networks.

The physical structure of a two-tier data center enables a migration path to an EVPN-VXLAN spine-and-leaf data center in the future.

Edge Data Center Overview

Enterprises that have migrated most of their workloads to the cloud and no longer require a large on-premises data center can use their existing campus network wiring closets or small server rooms to deploy small on-premises workloads.

The same AOS-CX switches that provide wired connectivity to users and Internet of Things (IoT) devices can be leveraged to provide server access.

An edge data center supports high-bandwidth and low-latency access to computing and storage resources for distributed workloads that may not be well suited to cloud deployments.

General Design Considerations

Out-of-Band Management

HPE Aruba Networking data center designs use a dedicated management LAN connecting to switch management ports and to host lights-out management (LOM) ports. Typically, a single management switch is deployed at every rack for OOB management. A dedicated management switch ensures reliable connectivity to data center infrastructure for automation, orchestration, and management without risk of disrupting management access when making changes to the data center’s data plane configuration.

Top-of-Rack Design

Deploying switches in the ToR position enables shorter cable runs between hosts and switches. The result is a more modular solution with host-to-switch cabling contained inside a rack enclosure and only switch uplinks exiting the enclosure. This approach helps reduce complexity when adding racks to the data center.

In typical data centers, each rack is serviced by a redundant pair of switches. This enables connection of dual-homed hosts to two physical switches using an MC-LAG bundle for fault tolerance and increased capacity. CX switches use two different strategies to support MC-LAGs: VSX switch pairing and Virtual Switching Framework (VSF) switch stacking.

VSX enables a distributed and redundant architecture that is highly available during upgrades. It virtualizes the control plane of two switches to function as one device at Layer 2 and as independent devices at Layer 3. From a data-path perspective, each device performs an independent forwarding lookup to decide how to handle traffic. Some of the forwarding databases, such as the MAC and ARP tables, are synchronized between the two devices via the VSX control plane over a dedicated inter-switch link (ISL) trunk. Each switch builds Layer 3 forwarding databases independently.

When deploying a pair of switches in VSX mode, at least two ports must be members of the LAG assigned as the ISL, which supports control plane functions and serves as a data path between the switch pair. The ISL ports should be the same speed as the uplinks ports.

VSX requires a keepalive between members to detect a split brain condition, which occurs when communication over the ISL is no longer functional. Best practice is to configure the keepalive to use the OOBM port when using a dedicated management network. Alternatively, a loopback IP address or a dedicated low-speed physical port can be used for keepalive traffic. Loopback-based communication is supported over redundant routed paths for increased resiliency.

VSF combines a set of two to ten CX 6300 switches into a high availability switch stack using a ring topology. Data centers use VSF stacks to connect racks of 1 Gbps connected hosts to upstream leaf switches. A VSF stack operates with a single Layer 2 and Layer 3 control plane. One switch member of the stack operates in the Conductor role, which manages all other switch stack members. A second member of the stack operates in the Standby role. The Conductor synchronizes state and configuration information with the Standby switch, so it can assume the Conductor role in case of failure. Monitoring for a split-brain condition is achieved using the the OOBM port of each stack member connected to a common management network.

For the most common connection speeds and backward compatibility, choose a ToR switch that supports access connectivity rates of 1, 10, and 25 Gbps on each port. These connection speeds can be increased simply by upgrading transceivers, DACs, or AOCs.

For high throughput compute racks, CX 9300 and CX 9300S switches supports 100 and 200 Gbps host connectivity with 400 Gbps switch uplinks. Breakout cabling and AOCs support connecting four QSFP56-based 100 Gbps host NICs, two QSFP56-based 200 Gbps host NICs, or two QSFP28-based 100 Gbps host NICs to one physical CX 9300-32D switch port. The 9300S can also be optimized to support 25 Gbps hosts.

Keep the following in mind when selecting ToR switches:

  • Number and type of server connections: Typical ToR switch configurations support 48 host-facing ports, but lower-density ToR options are available in the CX 8360 series. A rack of 1 Gbps hosts can be connected using the CX 6300 series.
  • Host connectivity speed: To simplify management, consolidate hosts connecting at the same speeds to the same racks and switches. Adapting the port speed settings of a particular interface may impact a group of adjacent interfaces. Consider interface group size when planning for a rack requiring multiple connection speeds. High speed storage and compute hosts connecting at 100 Gbps or 200 Gbps require the CX 9300-32D switch.
  • ToR-to-spine/core connectivity: ToR switch models support a range of uplink port densities. The number and port speed of the uplinks define the oversubscription rate from the hosts to the data center fabric or data center core. For example, in a four-spine fabric deployment at 100 Gbps, a non-oversubscribed fabric can be implemented for racks of 40 servers connected at 10 Gb.
  • VSX uplink consumption: When using VSX for redundancy, two uplink ports are consumed for ISLs providing data-path redundancy and cannot be used for spine or data center core connectivity.
  • DSS feature requirements: The CX 10000 is required in a data center design that implements inline stateful firewall inspection using the AMD Pensando programmable DPU.
  • Cooling design: Different ToR models are available for port-to-power and power-to-port cooling. In power-to-port configurations, an optional air duct kit can isolate hot air from servers inside the rack. Cabling can absorb heat and restrict airflow. Short cable routes and good cable management improve airflow efficiency.

Host Connectivity

A critical step in designing a data center is identifying the types of connectivity required by the computing hosts. Server hardware typically has an Ethernet RJ45 port for a lights-out management device such as HPE iLO. The lights-out port is typically connected using a Cat5e or Cat6 copper patch cable to a switch on the management LAN.

Host connections are usually 10 Gb or 25 Gb using SFP+/SFP28 fiber modules, copper direct-attach cables (DACs), or active optical cables (AOCs). DACs have limited distance support and can be harder to manage than optical cables due to the thicker wire gauge. AOCs support longer distances than DACs. AOCs are thinner and easier to manage. Both DACs and AOCs cost less than separate optical transceivers with fiber patch cables.

High-speed host connectivity is supported using QSFP-DD transceivers and AOCs with the CX 9300 switch. Both optics and AOCs can breakout a single high-speed 400 Gbps switch port into multiple lower speed 200 Gbps and 100 Gbps connections. The switch can support QSFP56 and QSFP28-based host NICs. The AOS-S and AOS-CX Transceiver Guide provides detailed information on using breakout cables and AOCs.

It is important to verify that both the host’s network interface controller (NIC) and the ToR switch are compatible with the same DAC or AOC. When separate transceivers and optical cables are used, verify transceiver compatibility with the host NIC, ToR switch, and optical cable type. The supported transceiver on the host is often different from the supported transceiver on the switch. Always consult with a structured cabling professional when planning a new or upgraded data center.

When deploying a converged network for IP storage traffic, look for NIC cards that support offload of storage protocols. This minimizes latency of storage traffic by reducing the load on a host CPU.

Applications can be hosted directly on a server using a single operating system, commonly referred to as a “bare-metal” server. Multiple hosts can be virtualized on a single physical server using a hypervisor software layer. Examples include VMware ESXi or Microsoft Hyper-V.

Hypervisors contain a virtual switch that provides connectivity to each virtual machine (VM) using Layer 2 VLANs. A successful data center design should support Layer 2 and Layer 3 connectivity using untagged and VLAN-tagged ports to match the required connectivity to the server and/or virtual switch inside the server. HPE Aruba Networking Fabric Composer provides visibility and orchestration of the configuration between the server and ToR switches to ensure that connectivity is established properly.

Host mobility refers to the ability to move physical or virtual hosts in a data center network without changing the host network configuration. Especially powerful for virtualized hosts, this ensures optimized computing resources, high availability of applications, and efficient connectivity for distributed workloads. EVPN-VXLAN fabrics and two-tier data centers support flexible host mobility, allowing all data center VLANs to be present on all ToR switches. An EVPN-VXLAN design provides tunneled Layer 2 adjacency over a routed underlay, and it can logically extend Layer 2 adjacency between data center locations.

Spine-and-Leaf with EVPN-VXLAN Fabric

An EVPN-VXLAN fabric provides a virtual Layer 2 network overlay that is abstracted from the physical network underlay supporting it. This allows hosts to operate within the same VLAN network segment, even when the hosts are separated by a Layer 3 boundary, by encapsulating traffic within a tunnel. Symmetric Integrated Routing and Bridging (IRB) in EVPN-VXLAN enables Layer 3 routing between overlay network segments.

Physical Topology

The diagram below illustrates the physical connectivity for the complete set of roles in an EVPN-VXLAN solution.

**Physical Topology**

Underlay Network Design

The underlay of an EVPN-VXLAN data center network provides IP connectivity between switches. The network underlay ensures that VXLAN tunneled traffic (the overlay network) can be forwarded between tunnel endpoints on leaf switches.

The underlay network is implemented using a 3-stage Clos-based spine-and-leaf fabric topology. It is deployed as a Layer 3 routed network, with each leaf connected to each spine using a routed-only port. The spine-and-leaf underlay topology optimizes performance, provides high availability, and reduces latency because each leaf is never more than one hop across multiple load-balanced paths to all other leaf switches.

**Underlay Network**

The spine-and-leaf topology provides a flexible, scalable network design that can accommodate growth without disrupting the existing network. It is easy to begin with a small one- or two-rack fabric that can increase capacity without requiring replacement of existing hardware. Leaf switches are added to new racks to increase computing and network attached storage (NAS) capacity. Spine switches are added to increase east-west fabric capacity between leaf switches.

This topology is roughly analogous to the architecture of a chassis-based switch, where leaf switches are comparable to interface line cards and spines are comparable to the chassis fabric providing data capacity between line cards.

HPE Aruba Networking data centers typically use OSPF as the underlay routing protocol to distribute underlay IP reachability information between all fabric switches. OSPF is a widely used, well understood Interior Gateway Protocol (IGP) that provides straightforward configuration and fast convergence. When adding an EVPN-VXLAN overlay, the underlay route table is small, consisting primarily of loopback IP addresses to establish overlay routing protocol adjacencies and VXLAN tunnel endpoint reachability. OSPF routing in the underlay also enables the selection of appropriate overlay routing protocols to support a multifabric environment. A single OSPF area and point-to-point interfaces are recommended to minimize complexity.

Setting a Layer 2 and IP maximum transmission unit (MTU) of 9198 bytes on underlay interfaces connecting spine and leaf switches avoids fragmenting the jumbo sized frames created when adding VXLAN encapsulation.

Server access switches do not participate in the routed underlay. They are connected to upstream leaf switches using Layer 2 only links.

Spine Design

The spine layer provides high-speed, routed connectivity between leaf switches. In a spine-and-leaf architecture, each leaf switch is connected to each spine switch. Each leaf-to-spine connection should use the same link speed to ensure multiple equal-cost paths within the fabric. This enables routed ECMP-based load balancing and ensures connectivity if a link goes down.

All spine switches must be the same switch model. The port capacity of the spine switch model defines the maximum number of leaf switches in a single spine-and-leaf instance. For a redundant ToR design, the maximum number of leaf racks is half the port count on the spine switch model.

A typical spine-and-leaf network begins with two spines for high availability. Spine switches are added to increase fabric capacity and fault tolerance. The impact of a spine failure is reduced for each additional spine added to the underlay. The loss of a single spine reduces overall fabric capacity in the following manner:

  • 2 spines: 50% capacity reduction
  • 3 spines: 33% capacity reduction
  • 4 spines: 25% capacity reduction

The maximum number of spines is determined by the leaf switch model with the fewest number of uplinks. In a redundant ToR design, the maximum number of spines is the uplink port count of the leaf switch with the fewest uplinks minus two, as two of the uplinks are consumed for a VSX inter-switch link (ISL) between the ToRs. In a single ToR design, the maximum number of spines is equal to the number of uplink ports of the leaf switch with the fewest uplinks. Each ToR switch must connect to each spine for ECMP to work effectively.

A CX 9300-based spine offers high-density rack support when using breakout cabling. A CX 9300 can break out a single 400 Gbps spine port to four 100 Gbps connected leaf switch ports over single-mode fiber. It also can support two 100 Gbps connected leaf switches per spine port over multimode fiber and AOCs. This allows the CX 9300 to double or quadruple the number of racks supported over its physical port count.

Using the CX 9300 in a leaf role supports extreme horizontal CX 9300 spine scaling. When dedicating half of a 9300-32D leaf switch’s available ports to host connectivity, a 5.6 Tbps fabric comprising 14 spines can be implemented to support a redundant ToR design. A 6.4 Tbps fabric comprising 16 spines can be implemented in a single ToR design. A CX 9300 spine and leaf combination can support connecting multiple links to each spine, if the required number of racks permits. This deployment model supports very high-throughput racks containing 100 Gbps connected storage and compute hosts.

Consider the following when selecting switches:

  • Determine rack media and bandwidth requirements.
  • Determine if single or redundant ToR switches will be installed.
  • Determine how many racks are needed for current computing and storage requirements.
  • Determine the spine switches required to support the planned racks.
  • Design the data center network for no more than 50% capacity to leave room for growth.

When deciding where to physically place the spine switches, consider their distance from the leaf switches and the media type used to connect them. Spine-to-leaf connections are generally 40 Gb or 100 Gb fiber using quad SFP (QSFP) transceivers or AOCs, in which the cable and transceiver are integrated, similar to DACs. The CX 9300-32D can support up to 400 Gbps spine-to-leaf connections for higher speed data center applications.

Overlay Data Plane Network

An overlay network is implemented using VXLAN tunnels that provide both Layer 2 and Layer 3 virtualized network services to workloads directly attached to leaf switches. VXLAN Network Identifiers (VNIs) identify distributed Layer 2 and Layer 3 segments in a VXLAN overlay topology. Symmetric IRB enables overlay networks to support contiguous Layer 2 forwarding and Layer 3 routing across all leaf nodes.

A VXLAN Tunnel End Point (VTEP) is the function within leaf switches that handles the origination and termination of point-to-point tunnels forming an overlay network. A VTEP encapsulates and decapsulates Layer 2 Ethernet frames inside a VXLAN header in a UDP datagram. The source VTEP assigns a VNI to inform the destination VTEP of the VLAN or route table associated with the encapsulated frame. A single logical VTEP is implemented when VSX redundant leaf switches are deployed in a rack. VTEP IP addresses are distributed in the underlay network using OSPF. Spine switches provide IP transport for the overlay tunnels but do not participate in the encapsulation/decapsulation of VXLAN traffic.

**VXLAN Frame**

The diagram below illustrates the VXLAN data plane network, a full mesh of VXLAN tunnels between leaf switch VTEPs.

**Overlay Network**

Note: Server access switches do not contain VTEPs or participate in VXLAN forwarding. They support data center host attachment to the overlay by extending VLANs from fabric leaf switches.

A Layer 2 VNI represents a single broadcast domain, similar to a traditional VLAN ID. A Layer 2 VNI is associated with a VLAN on individual switches in the fabric and collectively stitches the set of distributed VLANs into a single Layer 2 broadcast domain. When a VXLAN encapsulated frame arrives at a VTEP termination with a Layer 2 VNI, the switch unencapsulates the frame and forwards it natively based on the MAC address table of the VLAN associated with the VNI.

**L2 VNI Broadcast Domain**

A switch supports multiple routing domains by implementing virtual routing and forwarding instances (VRFs). Each VRF consists of a unique route table, member interfaces that forward traffic based on the route table, and routing protocols that build the route table. Different VRFs may contain overlapping IP address ranges because the individual route tables are discrete. An EVPN-VXLAN overlay must consist of at least one non-default VRF as a container for overlay VLAN SVIs and routed interfaces. Multiple VRFs can be used to provide segmentation for multi-tenancy and policy enforcement.

A Layer 3 VNI associates VXLAN encapsulated traffic with a VRF. When a VXLAN encapsulated packet arrives at a VTEP with a Layer 3 VNI, the packet is unencapsulated and forwarded using the associated VRF’s route table.

Symmetric IRB uses a distributed IP gateway model. The gateways for fabric host VLANs are anycast IP addresses. The same virtual IP address is assigned to the same VLAN on each switch in the fabric using HPE Aruba Networking’s Active Gateway feature. A distributed virtual MAC address is also associated with the virtual IP. This strategy supports moving active VM guests between hypervisors, which are attached to switches in different racks.

Note: When using Active Gateway to create a distributed overlay IP gateway address across all leaf switches, the Active Gateway IP address also is typically assigned to the VLAN SVI on each switch to conserve IP addresses. The Active Gateway IP is not supported as a source address when using the ping command, and a unique VLAN SVI address is not available as a source IP. When testing host reachability, the ping command must specify a unique source IP address, such as a loopback IP assigned to the same VRF, to verify reachability.

Traffic from a host that must be routed hits the virtual gateway IP address assigned to its directly attached leaf switch. Both the source and destination VTEP perform routing functions, and the source VTEP assigns an L3 VNI to inform the destination VTEP of the appropriate VRF for forwarding.

**L3 VNI Routing Domain**

Border leaf switches provide connectivity between EVPN fabric hosts and external networks such as the campus, WAN, or DMZ. A firewall is typically placed between the border leaf and external networks to enforce north/south security policy.

Server Access Layer

Server access switches extend overlay VLANs from an EVPN-VXLAN leaf to a high density set of lower-speed connected hosts using an economical Layer 2 switch model. They do not participate directly in underlay routing or overlay virtualization.

Server access switches provide Layer 2 redundancy to attached hosts using VSX or Virtual Switching Framework (VSF). VSF enabled switches operate as a single logical stack across one or more racks adjacent to their uplink leaf. VSF supports redundant MC-LAGs to downstream hosts and protection against split-brain conditions using the out-of-band management port to monitor the status of stack members.

When making the transition to an EVPN-VXLAN overlay in an established data center, existing ToR switches can be Layer 2-connected as server access switches to an EVPN-VXLAN leaf as part of the transition strategy. This permits moving existing rack infrastructure into the new EVPN-VXLAN overlay on a flexible timeline without the requirement to replace existing ToR switches.

**Server Access VLAN Extension**

Overlay Control Plane

The VXLAN control plane distributes information for sharing host reachability and dynamically building VXLAN tunnels. Reachability between endpoints in a VXLAN network requires associating fabric connected endpoints with their respective VTEP and VNIs across all fabric switch members. This reachability information is used by a source VTEP to assign a VXLAN VNI in the VXLAN header and the destination VTEP IP in the IP header.

Attached hosts are learned at their uplink leaf switch using Ethernet link layer protocols. Overlay reachability information across the VXLAN fabric is distributed using Multiprotocol Border Gateway Protocol (MP-BGP) as the control plane protocol using the EVPN address family. BGP advertises both host IP and MAC prefixes. This approach minimizes flooding while enabling efficient, dynamic discovery of all hosts within the fabric.

Using a distributed control plane that dynamically populates endpoint information provides the following benefits:

  • It avoids flood-and-learn techniques that can consume large amounts of bandwidth due to the replication of traffic in a large spine-and-leaf environment.
  • Network configuration is simplified as fabric leaf VTEP switches automatically discover peer VTEP switches inside the fabric, building dynamic VXLAN tunnels.
  • A distributed control plane provides redundancy and a consistent topology state across the data center fabric switches.
  • A distributed control plane allows optimal forwarding using distributed gateways at the ToR switches. This enables default gateway addresses to remain the same across the fabric.

The use of MP-BGP with the EVPN address family provides a standards-based, highly scalable control plane for sharing endpoint reachability information with native support for multi-tenancy. For many years, service providers have used MP-BGP to offer secure Layer 2 and Layer 3 VPN services on a very large scale. Network operations are simplified by using an iBGP design with route reflectors so that peering is required only between leaf switches and two spines. iBGP is required for an individual fabric control plane, when establishing a multifabric environment. Some of the more notable BGP control plane terms include:

  • Address Family (AF): MP-BGP supports exchanging network layer reachability information (NLRI) for multiple address types by categorizing them into address families (IPv4, IPv6, L3VPN, etc.). The Layer 2 VPN address family (AFI=25) and the EVPN subsequent address family (SAFI=70) are used to advertise IP and MAC address information between MP-BGP speakers. The EVPN address family contains reachability information for establishing VXLAN tunnels between VTEPs.
  • Route Distinguisher (RD): A route distinguisher enables MP-BGP to carry overlapping Layer 3 and Layer 2 addresses within the same address family by prepending a unique value to the original address. The RD is only a number with no inherently meaningful properties. It does not associate an address with a route or bridge table. The RD value allows support for multi-tenancy by ensuring that a route announced for the same address range in two different VRFs can be advertised in the same MP-BGP address family.
  • Route Target (RT): Route targets are MP-BGP extended communities used to associate an address with a route or bridge table. In an EVPN-VXLAN network, importing and exporting a common VRF route target into the MP-BGP EVPN address family establishes Layer 3 reachability for a set of VRFs defined across a number of VTEPs. Layer 2 reachability is shared across a distributed set of L2 VNIs by importing and exporting a common route target in the L2 VNI definition. Additionally, Layer 3 routes can be leaked between VRFs using the IPv4 address family by exporting route targets from one VRF that are then imported by other VRFs.
  • Route Reflector (RR): To optimize the process of sharing reachability information between VTEPs, use route reflectors on the spines to simplify iBGP peering. This design enables all VTEPs to have the same iBGP peering configuration and eliminates the need for a full mesh of iBGP neighbors.

The MP-BGP EVPN address family consists of several route types.

  • Route type 2 shares MAC address and host IP reachability information.
  • Route type 5 shares IP prefixes that are reachable by a subset of fabric switches, which is most commonly used to share a default route and external prefixes from the border leaf to other leaf switches.
  • Route type 3 shares VTEP IP and VNI values to establish VXLAN tunnels dynamically within a fabric.

Route type 2 MAC advertisements are associated with a VLAN based on a route-target value. The same route-target value should be imported and exported for the same VLAN ID on all switches in the fabric. This ensures complete propagation of Layer 2 reachability across the fabric. VLAN route targets can be automatically derived when using an iBGP control plane to simplify configuration and ensure consistency throughout the fabric.

The diagram below illustrates an example of sharing EVPN route-type 2 MAC address reachability using the iBGP control plane.

**iBGP control plane route-type 2 advertisement**

The following screenshot shows an example of an EVPN learned MAC address installed in the MAC address table with its VTEP association.

**MAC address table with VXLAN target**

Route type 5 IP prefixes are associated with a VRF based on a route-target value. A consistent route-target value should be imported and exported for the same VRF on all switches in the fabric. This ensures complete propagation of Layer 3 reachability across the fabric.

Multifabric Underlay

MP-BGP EVPN control plane peerings and VXLAN tunnel termination require establishing IP reachability to loopback interfaces between locations in a multifabric topology. External BGP (eBGP) typically shares loopback/VTEP reachability between sites.

MP-BGP uses an AS number to identify an administrative relationship between BGP speakers. BGP peers with the same AS number are members of the same administrative domain and are considered internal peers (iBGP). BGP peers with different AS numbers are considered external peers (eBGP). Internal and external BGP peers have different default behaviors and requirements. eBGP is often employed between different network segments within the same organization, because default eBGP peering behaviors are useful to the network design.

The diagram below illustrates a set of eBGP IPv4 address family peerings between border leaf switches in a two fabric topology. Layer 2 connectivity between sites is provided by a metro Ethernet circuit. Routed interfaces on each border leaf switch establish a peering relationship with each border leaf switch in the remote fabric. Loopback IP addresses are shared, to establish MP-BGP EVPN peerings and VTEP tunnel terminations.

eBGP IPv4 simple underlay peering

Underlay eBGP peerings typically follow the physical links available between network locations. These links may not align directly with the control plane EVPN peerings. Dark fiber and metro Ethernet circuits are common connectivity options between sites.

As the number of interconnected fabrics increase, the number of high speed circuits required at the primary site may exceed the number of ports available on the border leaf switches. Available high speed ports on a spine switch can be used as part of a multifabric underlay. The WAN paths and MP-BGP IPv4 peerings vary based on each environment’s variables and design preferences.

Multifabric Overlay Control Plane

MP-BGP EVPN is used as the control plane in a multifabric overlay, just as with a single fabric overlay.

iBGP is used internally within each fabric. Each leaf switch within a fabric establishes an MP-BGP EVPN address family peering with a pair of route reflectors located on two of the spine switches.

eBGP is used between fabrics to permit VXLAN traffic to be re-encapsulated and forwarded in a second tunnel and to take advantage of useful default behaviors that assist in a multifabric environment.

When more than one fabric is present in a single location, typically only one set of border leaf switches is used to establish an MP-BGP EVPN address family peering with external fabrics over the available WAN path. Any border leaf that peers between sites is called a border leader.

The diagram below illustrates MP-BGP EVPN peerings in a two fabric topology.

**Multifabric BGP control plane peerings**

Additional route-target values are defined to control the installation of reachability information between fabrics. Each VLAN and VRF is assigned an intra-fabric route-target during initial creation. An administrator configures an additional global route-target that is shared between fabrics for extended VLAN and VRF network segments. This strategy allows network segments that should not be extended across all fabrics to exist independently and remain a part of a local-only fabric overlay.

For example, if three fabrics had VLAN 20 in their respective overlays, EVPN route-targets (RTs) can be assigned to share VLAN 20 host reachability between two fabrics, but not the third fabric. The following example route-target assignments accomplish this goal.

  • Fabric 1, VLAN 20 — Local RT: 65001:20, Global RT: 1:20
  • Fabric 2, VLAN 20 — Local RT: 65002:20, Global RT: 1:20
  • Fabric 3, VLAN 20 — Local RT: 65003:20

Global route targets also are assigned to VRFs, when extending routed IP prefixes between fabrics.

Multifabric Data Plane Network

VXLAN tunneling extends Layer 2 and Layer 3 domains across multiple EVPN-VXLAN fabrics. The fabrics can be in different pods of the same data center, in different data centers in the same campus, or in more physically distant data centers. The connection between data center fabrics is referred to as a data center interconnect (DCI).

Layer 2 and Layer 3 network segments are extended between fabrics using the same VNI values across all fabrics. For example, the same Layer 2 VNI value for VLAN 20 and the same Layer 3 VNI for VRF 1 must be the same across all fabrics.

VXLAN tunnels between fabrics are set up only between border leaf switches to maximize both local and multifabric scalability. Establishing a full mesh of tunnels only between border leaf switches eliminates the need to establish VXLAN tunnels between all VTEPs in all fabrics.

The following diagram illustrates inter-fabric and internal fabric VXLAN tunnels in a two fabric topology.

**Inter-fabric VXLAN tunnels**

VXLAN tunneled traffic between hosts within a local fabric is encapsulated at a single source VTEP and unencapsulated at a single destination VTEP. There is one logical tunnel between any two hosts in a single fabric. A full mesh of VXLAN tunnels between all VTEPs enables this forwarding model.

In a multifabric topology, traffic between hosts in different fabrics can traverse up to three VXLAN tunnels. By default, CX switches do not permit traffic received in a VXLAN tunnel to be forwarded out another VXLAN tunnel. This behavior must be disabled to allow multifabric host reachability. To protect an individual fabric from forwarding loops, VXLAN re-encapsulation can be disabled only between iBGP and eBGP dynamically learned tunnels. Within a fabric, iBGP is used to discover VXLAN VTEPs and dynamically build VXLAN tunnels. eBGP is used to discover VTEPs and establish VXLAN tunnels between fabrics.

When overlay hosts communicate between fabrics, traffic is encapsulated at the source host’s directly connected leaf switch with the destination VTEP set as the same fabric’s border leaf. The border leaf in the source fabric re-encapsulates the traffic with a destination VTEP of the border leaf in the destination fabric. The border leaf in the destination fabric re-encapsulates the traffic with a VTEP destination of the leaf switch directly connected to the destination host.

**Inter-fabric VXLAN Host Communication**

A full mesh of VXLAN tunnels between border leaf switches is established between fabrics in a multifabric topology consisting of three or more fabrics.

Two-Tier Data Center

A Two-Tier design uses traditional protocols, making it simple to deploy, operate, and troubleshoot without the need for specialized knowledge in overlay protocols or design. This architecture is appropriate for medium and small data centers, but can be implemented on a per data center pod basis in a larger environment.

Note: The VSG uses Two-Tier to refer to a topology consisting of Layer 2 multi-chassis LAGs between a collapsed routed/Layer 2 core layer and a Layer 2 only set access switches compared to a spine-and-leaf network using routed links between spine and leaf layers.

Host information in a two-tier data center is populated using traditional bridge learning and ARP methods.

Topology Overview

A Two-Tier data center network implements aggregation and Layer 3 services in a data center core layer, and endpoint connectivity in a Layer 2 access layer. All access switches are Layer 2 connected to both core switches using MC-LAG for load sharing and fault-tolerance.

The physical layout of an L2 two-tier design is consistent with a two-spine spine-and-leaf architecture, which provides a future migration path to an EVPN-VXLAN overlay using a Layer 3 spine-and-leaf underlay, while protecting the investment made in Two-Tier networking equipment.

**L2 Two-Tier Network Overview Diagram**

Core Design

The core layer is deployed as a VSX pair of switches with high-density, high-bandwidth ports. This requires that both core switches are the same switch model running the same firmware version.

The port capacity of the core switches defines the maximum number of racks supported in a Two-Tier architecture. For a redundant ToR design, the maximum number of racks is half the difference of the total port count of the core switch model minus VSX and campus links (ignoring any remainder). For example, a 32-port switch using two VSX links and two campus uplinks can support 14 redundant ToR racks: (32 - 4) / 2 = 14. In a non-redundant, single-switch ToR design, the number of racks supported is equal to the port count of the core switch model minus VSX and campus links.

The core switch model also defines the maximum capacity of the data center backbone. One advantage a routed Layer 3 spine-and-leaf architecture has over a Layer 2-based Two-Tier architecture is incremental expansion of east-west throughput capacity by adding spine switches. For example, adding a spine to a two-spine fabric increases its capacity by 50%, and adding two spines will double its capacity. In an L2 two-tier design, there is a single pair of VSX switches at the core to support rack-to-rack communication. Capacity planning for an L2 two-tier data center is critical, as large capacity upgrades generally require hardware replacement.

Access-to-core connections are generally 40 Gbps or 100 Gbps fiber using quad SFP (QSFP) transceivers or AOCs. When using the CX 9300 in both core and access roles, 400 Gbps access-to-core interconnects are supported for higher speed data center applications.

Increasing initial capacity between the core and access layers can be accomplished by upgrading to higher speed transceivers or by bundling additional links in the MC-LAGs between the core and access layers. However, increasing the links in each LAG significantly reduces the number of racks supported due to increased core port consumption.

Occasionally a subset of racks will require higher capacity links to the core in order to provide high-bandwidth, centralized services. Note that inconsistent uplink capacity between the core and access switches impacts host mobility as VMs requiring increased bandwidth should be attached only to a subset of switches.

The core layer provides a Layer 2 aggregation point for access switches. Traffic between hosts on the same VLAN in different racks will traverse the core layer in VLANs configured on the MC-LAG trunks between the core and access switches. Ubiquitous Layer 2 host mobility within a Two-Tier instance can be achieved by assigning all data center VLANs to all MC-LAG links between the core and access switches using 802.1Q VLAN tagging.

Active Gateway over VSX supports using the same IP address on both core switches and eliminates the need for redundant gateway protocols such as VRRP.

The core layer provides all Layer 3 functions for data center hosts, and it provides the connection point to external networks and services.

Access Switch Design

In a Two-Tier architecture, each ToR access switch is connected to both core switches using MC-LAG to provide link load-balancing and fault tolerance.

Redundant top-of-rack pairs using VSX are recommended to add fault tolerance for downstream hosts using MC-LAG. Although the Layer 2 connectivity between the access and core switches is loop-free through the implementation of MC-LAG and LACP, spanning-tree (STP) is configured as a backup loop avoidance strategy configuration to block accidental loop creation within a rack by a data center administrator. The core VSX pair is configured with an STP priority to ensure its selection as the STP root.

MC-LAG provides better link utilization between the core and access switches over implementing Multiple Spanning-Tree (MST) instances, as all redundant links remain active for sending traffic. Traffic is forwarded on an individual LAG member link selected by a hash-based algorithm applied on a granular per-flow basis. While MST instances can also allow using multiple links between the access and core to provide fault tolerance and balance traffic, the load balancing strategy requires static configuration and limits active forwarding of traffic to a single redundant link on a per-VLAN basis.