As an Amazon Associate I earn from qualifying purchases from

Wi-fi 9800 WLC KPI Weblog – Half 3

Half 3 of the 3-part Wi-fi Catalyst 9800 WLC KPIs

In earlier blogs, Wi-fi Catalyst 9800 WLC KPIs, Half 1 and Wi-fi Catalyst 9800 WLC KPIs, Half 2, we shared how one can verify WLC and connections to different gadgets in addition to how one can verify AP and RF well being standing.

On this weblog, we’ll deal with Key Efficiency Indicators for consumer evaluation, WLC packet drops, and packets punted to WLC CPU.  I’ll share methodical steps and outputs that we will accumulate from WLC to measure the well being of purchasers’ connectivity and WLC forwarding efficiency.

KPIs totally different buckets or areas:

  • WLC checks
  • Reference to different gadgets
  • AP checks
  • RF checks
  • Shopper checks
  • Packet Drops

Shopper Checks

After now we have verified AP and RF well being then we will deal with consumer connectivity. Utilizing “present wi-fi abstract” we will see the overall variety of purchasers linked. As well as, we will discover out if there are any excluded, disabled, and  overseas/anchored purchasers. We will hold monitoring this command periodically. Test if the variety of purchasers is throughout the anticipated values for our deployment. We will additionally establish if there are any drastic modifications for any of the values. The command additionally exhibits the variety of APs, roles, radios, and their standing.

Gladius1#sh wi-fi abstract
Max APs supported          : 2000
Max purchasers supported      : 32000
Entry Level Abstract
Complete    Up    Down
802.11 2.4GHz             1     1       0
802.11 5GHz               4     3       1
802.11 dual-band          2     2       0
802.11 rx-dual-band       0     0       0

Shopper Serving(2.4GHz)    3     3       0
Shopper Serving(5GHz)      4     3       1
Monitor                   0     0       0
Sensor                    0     0       0

Shopper Abstract
Complete Shoppers : 6
Excluded      : 0
Disabled      : 0
International       : 0
Anchor        : 0
Native         : 6

Test for complete variety of purchasers, excluded purchasers, and radio down APs.

In case we see excluded purchasers, we have to dig additional to establish the explanation for that. Decide if excluded purchasers have any misconfiguration or if exclusion may very well be resulting from another motive. Causes for excluding purchasers may very well be resulting from incorrect password, ip tackle matching different purchasers’ IP tackle, a number of affiliation failures, and so on. We will see the record of consumer exclusion insurance policies and standing utilizing the command “sh wi-fi wps abstract”.

We will break down the variety of linked purchasers within the totally different situations of the consumer state machine. This can assist us to slim down if there are too many consumers caught in transient states like Authenticating, IP learns, Mobility, or Webauth Pending. Use the command: “present wi-fi stats consumer element | i Authenticating         :|Mobility               :|IP Study               :|Webauth Pending        :|Run                    :|Delete-in-Progress     :”

Gladius1#present wi-fi stats consumer element | i Authenticating         :|Mobility               :|IP Study               :|Webauth Pending        :|Run                    :|Delete-in-Progress     :
Authenticating         : 0
Mobility               : 0
IP Study               : 1
Webauth Pending        : 0
Run                    : 5
Delete-in-Progress     : 0

Test for purchasers in transient states. On this case, we see the consumer in IP be taught state.

We might want to do an additional investigation if the variety of purchasers in transient states will not be lowering. The identical will apply if a lot of the purchasers stay in the identical transient state for an extended time period.

One instance may very well be if we see a excessive variety of purchasers caught in “IP be taught”. Then we should always evaluate the DHCP server standing and connectivity between WLC and DHCP server. For static IP tackle allowed eventualities, we will evaluate ARP forwarding.

One other instance may very well be if the variety of purchasers caught in “Webauth” is excessive. There are a number of causes that may trigger this. One motive may very well be internet web page redirects not being acquired or not accessible by purchasers. An alternative choice may very well be authentication failures when doing internet login for visitor SSIDs.

The final instance may very well be if we see numerous purchasers caught in “Authenticating”. If purchasers linked to dot1x SSIDs have authentication points then we should always evaluate the Radius server. We have to decide if the problem happens with a concrete Radius server or if the problem happens in numerous servers on the identical time. Within the beneath sections, I’ll describe how one can confirm Radius server standing.

We will additionally evaluate consumer delete causes and establish any sudden motive with counters rising. “Idle timeout” or “Session timeout” could be anticipated causes for purchasers to disconnect. Nevertheless, “DOT11 denied information charges” or “MIC validation failed” could be sudden and should require some additional evaluation. Use the command: “present wi-fi stats consumer delete causes | e :_0”

Gladius1#present wi-fi stats consumer delete causes | e :_0
Complete consumer delete causes
Controller deletes
Resulting from mobility failure                                         : 1
DOT11 denied information charges                                         : 5781192
L2-AUTH connection timeout                                      : 2
IP-LEARN connection timeout                                     : 968
Mobility peer delete                                            : 134
Informational Delete Cause
AP down/disjoin                                                 : 690
Session timeout                                                 : 661
Shopper provoke delete
AP Deletes
AP initiated delete for DHCP timeout                            : 1
AP initiated delete for reassociation timeout                   : 266

Test for sudden delete causes with excessive depend and rising. On this case, denied information charges

In one of many largest worldwide wi-fi occasions, we monitored delete causes excluding ones displaying zero hits. We may spot a delete motive that was persistently rising over time. Utilizing always-on-tracing we may discover that purchasers deleted resulting from that motive have been all connecting to a concrete SSID. When reviewing SSID configuration we may isolate a configuration mistake inflicting the disconnections. After addressing the configuration, no additional consumer deletes for sudden motive have been seen. We may proactively spot a difficulty, discover the foundation trigger and repair it. Above all, with out having to attend for finish purchasers to complain to start out the troubleshooting course of.

WLC has additionally a listing of predefined attainable failures with counters.  We will verify counters to establish potential points and be proactive in challenge detection. Utilizing the command: “present wi-fi stats trace-on-failure | ex :_0”

Gladius1#present wi-fi stats trace-on-failure | ex :_0
Wi-fi Hint On Failure Statistics
006. Export consumer MM....................................: 1
018. Capwap configuration standing failure.................: 46136
020. Shopper affiliation failure..........................: 5
021. Shopper MAB authentication failure...................: 5781677
023. Shopper stage timeout................................: 1642
025. Shopper mobility clear up............................: 1
027. DTLS handshake failure..............................: 2
030. DTLS no configuration packet drop...................: 5
032. DTLS invalid howdy packet drop......................: 168
034. SANET AUTHC failure.................................: 6

Test for failures with excessive depend and rising. On this case, MAB authentication failures.

If we’re utilizing dot1x and Radius servers, we might want to monitor the standing of the Radius servers. IOS-XE is utilizing dead-time and useless standards to find out standing of Radius server. These parameters permit the machine to establish a Radius server that’s not responding to requests, and carry out a switchover to a secondary Radius server. The server will probably be declared as useless as soon as the useless standards is met. Useless standards specifies the variety of tries that ought to fail, and the time with no response from the server. Each standards ought to be met to declare the server as useless. The server will stay in useless standing till dead-time expire.

We will verify if there’s any useless server at this second and the variety of instances a server has been declared as useless. This can assist us to diagnose points with the concrete Radius server resulting from lack of connectivity or misbehaviors from Radius or WLC. Use the command: “present aaa servers | i Platform Useless: complete|RADIUS: id”

Gladius1#present aaa servers | i Platform Useless: complete|RADIUS: id
RADIUS: id 1, precedence 1, host, auth-port 1645, acct-port 1646, hostname ISE
SMD Platform Useless: complete time 301s, depend 2
Platform Useless: complete time 179s, depend 10UP
RADIUS: id 2, precedence 2, host, auth-port 1812, acct-port 1813, hostname ISE3
SMD Platform Useless: complete time 0s, depend 0
Platform Useless: complete time 0s, depend 0

Test for platform useless time and depend to establish Radius servers that had points.

Radius standing is displayed per WNCD. It’s attainable that the identical Radius server is marked as useless for some WNCDs and alive for others. Every AP belongs to a WNCD. There’s a command to verify APs assigned per WNCD “present wi-fi load-balance ap affinity WNCD <0-7>”. If purchasers linked to APs in a single concrete WNCD ship Radius requests, and people requests don’t have a response then Radius standing for that WNCD will probably be DEAD. On the identical time, purchasers in different WNCD couldn’t be sending any Radius requests or getting a response.

For Radius marked as DEAD, we have to verify if the Radius server is reachable and replying to authentication and accounting requests. Radius statistics will assist us to establish if we’re lacking any responses for authentication or for accounting, the typical time to answer, the variety of entry rejects and accepts, and latency distribution. Use the command: “present radius statistics”

Gladius1#present radius statistics
Auth.      Acct.       Each
Most inQ size:         NA         NA          1
Most waitQ size:         NA         NA         14
Most doneQ size:         NA         NA          1
Complete responses seen:        279          0        279
Packets with responses:        279          0        279
Packets with out responses:          0        396        396
Entry Rejects           :          2
Entry Accepts           :         20
Common response delay(ms):         10          0         10
Most response delay(ms):        173          0        173
Variety of Radius timeouts:          0       4542       4542
Duplicate ID detects:          0          0          0
Buffer Allocation Failures:          0          0          0
Most Buffer Dimension (bytes):        764        780        780
Malformed Responses        :          0          0          0
Dangerous Authenticators         :          0          0          0
Unknown Responses          :          0          0          0
Supply Port Vary: (2 ports solely)
1645 - 1646
Final used Supply Port/Identifier:
Elapsed time since counters final cleared: 3w3d20h41m
Radius Latency Distribution:
<= 2ms :        181          0
3-5ms  :         32          0
5-10ms :         13          0
10-20ms:         14          0
20-50ms:         17          0
50-100m:         20          0
100ms :          2          0

Test for requests with out response, timeouts, excessive latency

In a single buyer we have been troubleshooting dot1x consumer’s connectivity points and located the explanation for failures was the Radius server marked as useless. When reviewing the outputs, we may see that Radius was replying to authentications however was now not replying to accounting packets. A workaround to attenuate affect was to disable the accounting record to keep away from WLC sending accounting packets. Whereas Radius directors have been troubleshooting accounting points within the server.

Packet drops and punted to CPU Checks

Now we will verify if there are any scalability points as a result of oversubscription of any of the WLC elements. I might begin by wanting on the quantity of visitors acquired and transmitted by bodily interfaces. Then reviewing the variety of broadcast/multicast and enter or output drops. If now we have a baseline we will examine the quantity of visitors with the baseline and attempt to discover out any discrepancies. Use command: “present int po1 | i line protocol|put fee|drops|broadcast”. Change Po1 together with your setup bodily or logical interface.

Gladius1#present int po1 | i line protocol|put fee|drops|broadcast
Port-channel1 is up, line protocol is up
  Enter queue: 0/375/0/0 (dimension/max/drops/flushes); Complete output drops: 0
  5 minute enter fee 39000 bits/sec, 42 packets/sec
  5 minute output fee 14000 bits/sec, 12 packets/sec
     Obtained 9389675 broadcasts (34521510 multicasts)
     Output 45735 broadcasts (1075205 multicasts)
     0 unknown protocol drops

Test for the quantity of visitors enter/output, drops, and broadcasts tx/rx

We will evaluate packets dropped by WLC and the explanations for these drops. When monitoring drops it is very important verify that are the explanations for the excessive quantity of packet drops. Subsequently, we will discover how briskly these drop counters are rising. We have to accumulate the identical output a number of instances with time reference. Enabling “terminal exec immediate timestamps” or gathering “present clock” will assist us to have time references. These time referenced outputs will probably be key to isolate impacting drops. Use the command: “present platform {hardware} chassis energetic qfp statistics drop”

Gladius1#present platform {hardware} chassis energetic qfp statistics drop
Final clearing of QFP drops statistics : by no means
World Drop Stats                         Packets                  Octets 
CGACLDrop                                      31                    7812 
Disabled                                      635                  105934 
InvL2Hdr                                      701                  206223 
IpFormatErr                                    68                    4488 
Ipv4NoAdj                                   67749                 6910538 
Ipv4NoRoute                                     6                     376 
Ipv6NoRoute                                  1096                   61376 
Ipv6mcNoRoute                               77683                 9477326 
SWPortMacConflict                           50316                 5874782 
SwitchL2mLookupMiss                         17568                 6681680 
TailDrop                                    54199                29501684 
UnconfiguredIpv4Fia                             3                     242 
UnconfiguredIpv6Fia                       1564372               186850863 
WlsCapwapError                               1018                  233293 
WlsCapwapReassFragConsume                    1064                 1231968 
WlsClientError                               3116                  112631

Test for drop causes with a excessive variety of packets, and fragmentation/reassembly drops.

Yet one more verify that we should always do is to investigate the variety of packets despatched to the management aircraft (punted) of the WLC for processing. We will monitor the variety of packets punted for every motive and verify for irregular quantity.  We will correlate a rise of punted packets with excessive CPU utilization occasions. Use the command: “present platform {hardware} chassis energetic qfp characteristic wi-fi punt statistics”

Gladius1#present platform {hardware} chassis energetic qfp characteristic wi-fi punt statistics
CPP Wi-fi Punt stats:
                                 App Tag     Packet Rely
                                 -------     ------------
         CAPWAP_PKT_TYPE_DOT11_PROBE_REQ           986190
              CAPWAP_PKT_TYPE_DOT11_MGMT            10031
              CAPWAP_PKT_TYPE_DOT11_IAPP          2975298
             CAPWAP_PKT_TYPE_DOT11_DOT1X            24901
        CAPWAP_PKT_TYPE_CAPWAP_KEEPALIVE           228099
            CAPWAP_PKT_TYPE_CAPWAP_CNTRL          1628480
         CAPWAP_PKT_TYPE_CAPWAP_DATA_PAT               33
          CAPWAP_PKT_TYPE_MOBILITY_CNTRL            58091
                       SISF_PKT_TYPE_ARP        218545290
                      SISF_PKT_TYPE_DHCP            15455
                     SISF_PKT_TYPE_DHCP6             7772
                   SISF_PKT_TYPE_IPV6_ND           199108
                SISF_PKT_TYPE_DATA_GLEAN                7
             SISF_PKT_TYPE_DATA_GLEAN_V6              100

Test for a excessive variety of punted packets and rising extra time.

We may additionally establish if we’re seeing any buffer failures and decide which is the scale for these buffers which might be reaching the utmost worth. Use the command: “present buffers | i buffers|failures”

Gladius1#present buffers | i buffers|failures
Small buffers, 104 bytes (complete 1200, everlasting 1200):
     0 failures (0 no reminiscence)
Center buffers, 600 bytes (complete 900, everlasting 900):
     35 failures (35 no reminiscence)
Huge buffers, 1536 bytes (complete 900, everlasting 900, peak 901 @ 2w6d):
     0 failures (0 no reminiscence)
VeryBig buffers, 4520 bytes (complete 100, everlasting 100, peak 101 @ 2w6d):
     0 failures (0 no reminiscence)
Massive buffers, 5024 bytes (complete 100, everlasting 100, peak 101 @ 2w6d):
     0 failures (0 no reminiscence)
VeryLarge buffers, 8304 bytes (complete 100, everlasting 100):
     0 failures (0 no reminiscence)
Big buffers, 18024 bytes (complete 20, everlasting 20, peak 21 @ 2w6d):
     0 failures (0 no reminiscence)

Test for buffer failures and establish buffer dimension.

The final verify may very well be information aircraft utilization. We will discover if the WLC is having information aircraft efficiency points resulting from visitors quantity, or some concrete options enabled. Use command shared in WLC checks: “present platform {hardware} chassis energetic qfp datapath utilization | i Load”

These KPIs have been useful to establish a buyer challenge. The client noticed a periodical excessive improve within the variety of ARPs packets punted to the CPU. By monitoring the counter for ARPs punted to the CPU, and gathering packet seize within the management aircraft we may establish that these ARPs have been despatched from some concrete mac addresses that have been doing malicious ARP scanning.

With this closing bucket, we end the Key Efficiency Indicators (KPIs) for Catalyst 9800 WLC.

Listing of instructions to make use of for KPIs and automation scripts

Within the doc beneath, there’s additionally a hyperlink to a script that may mechanically accumulate all of the instructions. It is going to accumulate instructions primarily based on platform and launch, save them in a file, and export the file. The script is utilizing the “Visitor-shell” characteristic that for now’s solely out there in bodily WLCs 9800-40/80 and 9800-L.

The doc additionally offers an instance of an EEM script to gather logs periodically. In conclusion, EEM together with the “Visitor-shell” script will assist to gather 9800 WLC KPIs and have a baseline in your Catalyst 9800 WLC.


For the record of instructions used to watch these KPIs



We will be happy to hear your thoughts

Leave a reply

Enable registration in settings - general
Compare items
  • Total (0)
Shopping cart