Monday, July 30, 2007

The Myth of the Self-Monitoring WLAN

Recently, as you all probably know by now, Duke University had a WLAN meltdown. The CIO, Tracy Futhey (Comment here) and the assistant IT director, Kevin Miller (Comment here) have put to rest the notion that the Apple iPhone caused it. Cisco has issued an advisory to the effect and Apple assisted in the effort.

I am not going to go into the details of what happened or why. Suffice it to say that mobile handhelds of all types, not just iPhones, send a lot of ARP traffic and the Cisco infrastructure was not ready for it. The quote at Network World explains that, "The advisory finally makes it clear that the iPhone simply triggered the ARP storms that were made possible by the controller vulnerabilities. Any other wireless client device, moving from one subnet to another apparently could have done the same thing."

What I will point out, however, is the problem we in the Wi-Fi community have today with the following simple delusion, "Your WLAN infrastructure as a cohesive, integrated, single-vendor solution is all anybody needs. It is self monitoring and self healing." I talk to a lot of people about which WLAN solution they are going to purchase and implement and I am always surprised by how many believe that the AP and controller vendor has all the answers. Don't get me wrong, I am a huge fan of this type of solution. Central management is critical for even medium sized organizations of 50 or more APs, much less larger ones that may a few hundred or even thousands. Manually changing the configuration of each AP is not a viable solution in these cases. The Admin needs assistance. And the story sounds so great, "Implement our solution and it will fix itself when it breaks and protect itself when security policies are breached." Who wouldn't want that?

But the truth is a little more complicated. As we have seen from previous posts, sometimes the solution doesn't behave the way your business practices need. Similarly, sometimes there are security problems within the infrastructure itself. So what to do?

This will sound like an advertisement for the company I work for and I apologize ahead of time but there is a very good reason I continue to work there. Mainly, I believe in the message.

When the Duke network went down and the Assistant IT director looked at his WLAN infrastructure dashboard, what did he see? I have not spoken with him directly but my guess would be it said, "hey man, it ain't me. Everything looks good from my end" So what did he do? he pulled out a sniffer and got to work. With packet traces in hand and assistance from Cisco and Apple he solved the problem. Did the infrastructure fix itself? Did it correctly identify the problem and solution? No. A patch is now needed to keep this from happening again.

One should not blame the infrastructure for not getting this right at the outset nor should one blame Mr. Miller. He was correctly reading what the controllers were telling him. But it shows how important it is to have a separate, 3rd party solution also available to get down to the bits and bytes or even spectrum analysis (if the problem should be something other than 802.11 protocol madness.)

There are a few great WLAN security vendors out there and they make 3rd party, best of breed solutions for monitoring the security of your WLAN (one of which recently got snatched up pennies on the dollar and will probably be rolled into another integrated, self-healing, self-monitoring role; against my better judgment.) There are an even smaller number who both monitor your security and your connectivity and performance and give you great troubleshooting tools built-in (insert shameless plug here). These should be your trusted advisor's when things go wrong. I am in no way suggesting that they would have identified the problem and cause and given a solution at Duke either (although I think they at least would have shown alerts for denial of service and strange traffic behavior.) What I am suggesting is that with them in place you now have a set of tools to assist in solving the problem. Remote packet and/or spectrum analysis. Alarm thresholds that can be set by the admin and will continue surveillance. Reports. System-to-system notifications. Graphs of speed and traffic type. Lists of who is connected to what and how. All the things you would need to get to the bottom of any problem in that invisible Luminiferous Ether.

1 comment:

  1. Very interesting insight. may i also suggest you to have a more sober background colour, the white font on black is a bit too harsh on the eye