Your Site's Performance
It is finished. After months of working weekends, you are finally ready to go live. The bug list is now short and manageable. You have done your performance testing and you are good to go. You go live with the redesigned site, breathe a sigh of relief and book tickets to the Caribbean. Then something happens. Support calls are spiking. It seems that your customers are complaining about the speed of the new site. The calls are coming mainly from customers in Ontario and in Florida. Your boss has been called in to the CEO’s office and gotten chewed out. He comes to see you with a stressed-out look on his face. He isn’t yelling but …
Our development manager friend has made a classic error: he was not sufficiently paranoid. He trusted a simulation that was not worthy of his trust. Every simulation deviates from reality some. If that deviation is small, then the simulated result will be a good predictor. If that deviation is too great, then the result is garbage. Our manager should have asked himself a set of questions before declaring that the new version of the software was good to go live:
· How many simultaneous customers are we simulating? Is this number correct?
· What hardware are we testing on? Is it representative of the live site?
· What other processes take place on the live site that aren’t present on the test platform?
· What are we testing? Is the ratio of tests that we are running a true indication of what we will see when we go live?
· Where are we testing from? If our test is within our own firewall, then we will not be simulating what the end users will see.
· Are our tests from a single geography? Every region has its own network characteristics and its own set of ISPs.
A good performance test must be accurate enough to cover the risk to your revenue stream and your brand. An experimental eCommerce site that sells no-name closeouts can temporarily afford to give the user a shaky experience. An upscale brand that sells $1B US/ year cannot. In the second case, great performance testing rigor must be instituted. This rigor involves doing the following:
· Create a set of test cases that accurately reflect the mix of activities that the site will experience on a daily basis, especially during peaks. This requires a careful analysis of the web analytics data and the creation of a transactional load profile. This load profile can then be used to add additional test cases and to set the frequency at which they are run.
· Move the test outside the firewall. Ideally, move the performance test close to where your users are. If you test in Atlanta, how will you know if you are getting good response time in Canada? Do you sell in Europe? California? This higher the risks the more elaborate the testing has to be.
· Test on multiple browsers. Which browsers are most of your customers running on? How does that affect your response times? A whole new set of browsers from Microsoft, Google, and others has just been released. They have a new set of performance characteristics that differ from the old browsers. You must test on every browser that can impact your performance.
· Test on multiple connection types. What types of connections do you customers use? If 50% of your revenue comes from dial-up in the Northeast, then you need to know what they will experience before rolling it out to them.
Andrew Grove chose “Only the Paranoid Survive” as the title of his famous book on business advice. We, as software development professionals, would do well to remember that.

Comments
Good article. But properly architected site is fast and scalable by design :) The only problem is to find such experienced site architector.
Anyway, I would also recommend this free web tool to measure site performance - http://Site-Perf.com/
It simulates browser by loading page with all its requisites (images, css, js,..) - and show nice detailed chart - so you can easyly spot bottlenecks.
Also useful thing is that this tool can measure quality of internet link of your server with high precision.
For developers I also would recommend YSlow plugin to Firefox.
Posted by: zuborg | September 28, 2008 08:56 PM
I think it all boils down to how best the performance model is and how best the model is executed. Besides the parameters you have listed, I think it is worthwhile to consider the following parameters as well:
1. Average Think time: one of the most important parameters. It will be a good idea to run some tests to arrive at the correct think time rather than providing a random figure.
2. Response time targets: We must admit that performance is relative: relative to the user load, hardware and there could be intermittent issues etc. Performance targets must be established for each transaction separately and in terms of percentiles rather than absolute or average response time. E.g. rather than specifying average response of less than 3 seconds for the whole site, the response target should be defined for each transaction/page separately and in terms of percentiles i.e. for a specific transaction, 90% requests should take less than 2 seconds, 98% should take less than 3 seconds and 99.99% should take less than 5 seconds.
3. The internet is not really under anybody's control. I think it will be suicidal to promise any response times over the internet especially across the globe as some regions are known to have bandwidth problems (China etc.). The response times should be defined at two levels: one at the server level i.e. without involving the internet and secondly, at the internet level considering the regions the site is targeted at.
I think all these measures will reduce possibilities of cancelled holidays! Unfortunately, the possibility cannot be eliminated altogehter :(
Posted by: ravi | October 7, 2008 11:28 AM