Troubleshooting

Overview

In data center networking, the most advanced step in troubleshooting often involves directly accessing the closest physical device related to the issue at hand. This trouble shooting mechanism allows the diagnosis of problems with connected endpoints. While administrators can use management tools to avoid accessing the physical device, there are scenarios where the tools are not utilized. The lack of utilization is either due to their absence, do not have of familiarity, or some of the administrators prefer device interaction. To address this, it is important to embrace these variations and aim for better integration between device-based approaches and management tools, rather than viewing them as conflicting options.

The complexity of troubleshooting the devices is multiplied by the dynamically varying endpoints within the data center. To troubleshoot, admins might extensively research endpoint metadata before beginning the process. Customers use different methods to store inventory data about their endpoints. This method can result in searching for incorrect or outdated information, causing confusion.

Additionally, admins may struggle to identify the correct network device to start with or make mistakes in the process.

Understanding the reasons behind entering the "Troubleshooting of last resort" scenario is essential for developing more effective solutions. Often, this scenario arises either due to reported connectivity problems or when admins attempt to establish connectivity for a new device or service. Interestingly, the first step in this scenario is to resort to the last option is asking a series of questions about MAC addresses, IP addresses, switches, ports, and more. Users frequently do not have this information and must look it up from spreadsheets, paper records, or outdated inventory systems.

Troubleshooting Tools

Admins can use MAC tables, ARP tables, and ping and trace tools to troubleshoot the issues. You can run the show mac-address-table command to troubleshoot:

cedar-sw-01# show mac-address-table MAC age-time : 300 seconds Number of MAC addresses : 15 MAC Address VLAN Type Port f4:03:43:d3:be:b8 1 dynamic 1/1/4 f4:03:43:c0:e1:38 1 dynamic lag256 f4:03:43:c0:e1:30 1 dynamic lag2 00:50:56:5e:7a:69 1 dynamic lag1 00:50:56:5b:a0:40 1 dynamic lag1 04:90:81:00:08:a2 10 dynamic lag256

The MAC address table helps you to check if you have seen the MAC address of the endpoint in the last 5 minutes. If the user does not report a problem or share the MAC address, find out the MAC address used by the endpoint. However, if the device has many virtual network parts such as switches, it can be unclear which MAC it is using.

You can also run the show arp command to troubleshoot the issue.

show arp cedar-sw-01# show arp all-vrfs IPv4 Address MAC Port Physical Port State VRF 10.10.3.23 00:50:56:ac:3d:c6 vlan30 lag1 reachable default 1.1.1.12 04:90:81:00:08:a2 vlan99 lag256 reachable default 1.1.1.1 04:90:81:00:08:a2 1/1/48 1/1/48 reachable default

For the following MAC address 04:90:81:00:08:a2, the output shows that there was an ARP request for this MAC address. The device with IPV4 address 1.1.1.1 is the VSX keepalive interface on the peer switch.

 

Endpoint metadata

Solving the issue involves getting the metadata right down to the switch. When users use specific commands, we can add extra details to the responses using metadata. For instance, if we know that the IP address 1.1.1.1 corresponds to a user VM named "Simon," we must include that information in the output.

show arp (decoration) cedar-sw-01# show arp all-vrfs IPv4 Address MAC Port Physical Port State VRF 10.10.3.23 00:50:56:ac:3d:c6 vlan30 lag1 reachable default 1.1.1.12 04:90:81:00:08:a2 vlan99 lag256 reachable default 1.1.1.1 (simons-vm) 04:90:81:00:08:a2 1/1/48 1/1/48 reachable default

The previous example is generic, to show the value of extra meta-data on standard switch screens, but in the vSphere context we have to research further to get more useful results.

 

VMware Use Cases

If you focus only on physical devices, you must figure out the specific switch to which your endpoint is connected. Other switches in the network might not have complete data about the endpoint, so they might not provide any information. In this scenario, more advanced data and perspectives become valuable for our administrators. By using VMware vSphere data as a foundation for your metadata, you gain a broader range of information beyond individual switches. You acquire a comprehensive set of data about topology of the fabric and endpoints. Operators use this information without requiring them to know the precise location of the endpoints.

Following queries that users might perform to gather more information about the endpoint:

Where can I find this endpoint (whether it is a virtual machine, VMkernel, or host)?

You might have a differing set of data on the endpoint, depending on their situation, and they want to know where it resides based on the limited information they have.

Create this query with a flexible set of data:

  • VMs name, DNS name (can be different from VM name), VMs IP (more than one IP is possible), VMs NIC name (VMs version, not ESXi), VMs MAC
  • VMkernel looks like a VM in some ways, just a virtual interface used for some important VMware control functions such as vMotion

  • ESXi name, DNS name (often the same as ESXi name), IP, vmnic, vmnic MAC

What does the user want in response to their query?

  • Switch, LAG , or Interface endpoint last seen
  • When it was last seen
  • Previous location (can be useful for detecting loops), will move if vMotion or depending on vmware load balancing

What additional data is available for this endpoint?

There is additional metadata for vSphere on the endpoint that can be useful for the customer. For a complete integration and for full visualization and automation, this data is used and is applied to Switch CLI.

For VMs

  • The basic inventory data on this VM
    • VMs name, DNS name (often different from VM name), VMs IP (they can have more than one), VMs NIC name (VMs version, not ESXi), VMs MAC

    • The VMs association with the Virtual network in ESXi/vsphere

      • Portgroup (VLAN, VLAN type), vSwitch, ESXi Host, ESXi Host NIC (name, IP, and MAC)
  • For VMkernel
    • Basic inventory data on this VMkernel

      • Name, IP, MAC

    • The VMKs association with the Virtual network in ESXi or vsphere

      • Portgroup (VLAN, VLAN type), vSwitch, ESXi Host, ESXi Host NIC (name, IP, and MAC)

  • For ESXi
    • The basic inventory data on this

      • ESXi name, DNS name, ESXi IP, ESXi Nics, and MACs

      • For each ESXi NIC, the association with vSwitch

What devices are connected to this port?

You can run show mac-table-address interface x/t/z command, to display active MACs.

The advantage of using VM metadata is that you can find out the location of the VM without active MACs. You can find out the portgroup or vSwitch location through LLDP from the host, and from the VM association you can find out the location of the switch port.

To this end, the show VMs interface x/t/z could respond with a mix of active and inactive VMs.