Troubleshooting Network slow Issues


Case 1: The switchports had an extremely high collision count

Looked at the Cisco switches connecting between  the guardhouse and main building.  Show interface shows the switchports had an extremely high collision count.  Looked at the configuration of the ports and they were only set to half duplex.  Changed both ends to full duplex forced.  Traffic was flowing without errors.  The connection was faster.

Case 2: Cisco switch logs  flapping message

Symptoms

1. Clients keeps receiving time out message when they access Microsoft SQL server.

2. TM Vista program freezes or gets Error testing for active program connectivity failure.

3. Ping gets time out randomly.
Troubleshooting step

1.    telnet to the switch that the server is connecting to.

2.    Use show loggings to check any errors>

3.    We found numerous errors:

008276: *May 22 13:21:19: %SW_MATM-4-MACFLAP_NOTIF: Host 0008.02a2.414d in vlan 1 is flapping between port Po3 and port Gi0/9

008277: *May 22 13:22:12: %SW_MATM-4-MACFLAP_NOTIF: Host 0023.7dea.fe76 in vlan 1 is flapping between port Po4 and port Po3

008279: *May 22 13:22:43: %SW_MATM-4-MACFLAP_NOTIF: Host 000d.939d.d87c in vlan 1 is flapping between port Po3 and port Gi0/18

Investigated and found that the servers had two NICs in two separate switches that were not joined via port aggregation.  Removed the secondary teamed NIC and half the errors went away.  The other errors that remained only had a single NIC in the network.  Continued troubleshooting and unable to find anything unusual.  

Decided to get Cisco TAC involved and opened case.  Spoke with multiple techs about the situation.  Took debugs of spanning tree and found that there was multiple convergences occurring on the network at the same time.  Was unable to see through the CLI without the debug because it was occurring so fast.  Traced it down to the switches and found switch 3560_7 to have a device plugged into gi0/1 that is running BPDU (stand for bridge protocol data unit. BPDUs are data messages that are exchanged across the switches within an extended LAN that uses a spanning tree protocol topology. BPDU packets contain information on ports, addresses, priorities and costs and ensure that the data ends up where it was intended to go. BPDU messages are exchanged across bridges to detect loops in a network topology. The loops are then removed by shutting down selected bridge interfaces and placing redundant switch ports in a backup, or blocked, state) and rebroadcasting MAC addresses on the network, causing the mac-address table to update with the wrong routing information.  Traced down what was in port gi0/1 in switch 3560_7.  A Crestron system was plugged into the port.  As soon as the port was disabled, the network became stable again and the errors and timeouts went away.  When the port was enabled, the issues started occurring again.

Tried to turn bpdugaurd on but it kept disabling the port.  Enabled bpdufilter but the mac-address table was learning outside of BPDU.  Finally, we moved the system to a VLAN of its own so it doesn't interfere with the rest of the network.  Accessing SQL, Vista program works as normal. Ping gets 0 lost.

Case 3: Pano broadcasts a lot ARP requests

The client complained the network is too slow. We used Wireshark and capture found that over 60% of the packets in both files were from the same source, a Pano server.  It appears that host is broadcasting ARP request out for everything in the local data subnet.  This generally doesn't happen unless that machine is trying to communicate with everyone on the local subnet (not a typical behavior).  There are 4 Pano devices (VMware VMs) in the Library and they are turned off every night. When they are off, the server tries to wake up them by sending packets. Keeping those devices on reduces the broadcasts.

Note: Machines infected with viruses or worms often having a scanning mechanism looking for other machines to infect which causes a high volume of ARP request. 

Case 4: It could be a bad cable. Please check this page for more details: Server uses over 75% bandwidth of the switch