There are certain situations, when your website or web app stops responding. Your browser shows an hourglass, but your server load is never near 100%. What does you server do than?
Here is a short and easy way to find out what does your server do instead of serving up pages. More precisely, here is a way to find out where exactly your web app stops working.
We used this tool recently to find a hideous bug in a portlet. The portlet called a webservice, that was fast all the time, except in a few cases, where it did not respond within 60 seconds or more. This made the calling thread wait indefinitely, and when many such threads appeared, the whole thing hung. This usually happened after a few hours, depending on the number of visitors. The CPU was well under 10% on both the IIS and the SQL server. When we checked the site, everything was OK, but after a while, the thing hung and stopped responding until an app restart. Naturally, none of this happened on developer workstations, since there was no traffic on them, so the number of waiting threads never went high.
After using this tool, we found out about this, and it took only an hour to rewrite the portlet.
Here are six easy steps to find out what hangs your server. Have fun!
Install WinDbg on the production server:
32 bit: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx
63 bit: http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx
Attach WinDbg to a process, namely the W3WP process in case of web applications
File > Attach to a process > w3wp.exe
You find yourself in the WinDbg console. The website is in suspend mode now, so act quickly.
Type the following commands in the console input:
.loadby sos mscorwks [press enter]
~*e !clrstack [press enter] This will show you which worker thread is in which .NET procedure
Send the content of the screen to your developers for analysis. :-)
As an example, here is what a simple one-button ASPX page does when you make it sleep for 20 seconds in a simple Button1_Click event. As you can see from the screenshot, there are two requests that are “hung”, both of them in the Button1_Click procedure.