Website Scaling, Part 2: The Metrics of Measuring
One thing I didn't go into in the first installment in this two-part series -- which is critical -- is how to measure Web scalability. The way you do this in a testing environment is, of course, called "load testing." But what you might not know about load testing is how it relates to Little's Law.
So in Part 2, I'll go over what Little's Law is and how it can be related to load testing, determining how to scale the Web and data layers. Finally, I'll share my tips for troubleshooting common scaling bottlenecks.
Applying Little's Law
Following is what Little's Law states:
The average number of things in the system is the product of the average rate at which things leave the system and the average time each one spends in the system. (And if there is a gross flow balance of things entering and leaving, the exit rate is also the entry rate.)Following is how this can be applied to load testing:
The response-time formula for a multi-user system can be proved using Little's Law and flow balance. Assume n users of average think time z are connected to an arbitrary system with response time r. Each user cycles between thinking and waiting-for-response, so the total number of jobs in the meta-system (consisting of users and the computer system) is fixed at n. If you cut the path from the system's output to the users, you see a meta-system with average load n, average response time z+r, and throughput x (measured in jobs per time unit). Little's Law says n = x*(z+r), and solving for r gives r = n/x - z.How does that help you? If you do a quick-ramped load test with up to a (small) number of users n and a very small think time z, you will get a measurement of the response time r versus the throughput x. To estimate the number of users n1 for desired response time r1 and a realistic think time z1, determine the throughput x1 at the desired response time r1 from your previous results, and use Little's Law to solve for n1.
Will this be accurate? Probably not. After all, system scalability is never perfect. Nevertheless, this should get you into the right ballpark and serve as a sanity check for measurements with more realistic think times and user profiles.
Scaling Up and Out
Scaling up refers to running your application on faster and bigger servers. Scaling out refers to running your application on multiple servers. In the AKF Scale Cube, there is a distinction between load-balanced cloned services and specialized services.
In general, scaling up may be easier from the application developer's point of view, but scaling out is almost always better.
Scaling the Web Layer
Web servers are easily grouped into Web farms behind a load-balancing switch. Depending on the way the application state is stored between requests (on a single Web server, in the request and response objects, or in a common database), the switch might or might not have to honor session affinity. This means sending subsequent requests from the same session back to the same server.
In the absence of a load-balancing switch, when there is a wide geographical distribution of user locations, and when there is a requirement for different servers to serve different populations, different user groups might be assigned to different servers.
Scaling the Data Layer
SQL database servers are typically designed to scale up well and often support clustered and replicated topologies that allow them to scale out transparently, as well as partitioned tables and indexes. Additional techniques can split the database by function, service or resource -- or, if necessary, by values that are looked up at the time of the transaction.
File-based databases, such as DBF and Microsoft Access, do not scale well in any direction. They can be used for Web applications with relatively small numbers of concurrent users (typically in the tens), and with relatively small amounts of data (typically in the tens of thousands of rows). For larger sites and data sets, SQL (and in some cases NOSQL) database servers are a much better choice from a scalability perspective.
Troubleshooting Scaling Bottlenecks
As mentioned previously, isolation testing can help you home in on bottlenecks, especially if you can instrument and log that part of the application. With sufficient patience and data, you can sometimes zoom in on problems, do a root cause analysis, and ultimately fix the problems.
A saying you might want to post on your wall and refer to when troubleshooting: If the problem is not where you are looking, you're looking in the wrong place. Often the knowledge of what a problem isn't gives you a big clue about what a problem is and where it might lie. At the same time, it's easy to be misled by your instincts -- what feels like a database problem might prove to be a network problem, and what feels like a threading problem might prove to be a lack of RAM.
Common SQL scaling bottlenecks: missing database indexes; extra database indexes; fragmented database indexes; out-of-date database index statistics; under-normalized databases; over-normalized databases; complex queries on the client that should be moved to stored procedures to avoid constant SQL optimization overhead; inefficient queries; excessive use of database views; inefficient stored procedures; and database design that does not match the usage patterns.
Common Web application bottlenecks: bringing too much data to the client for processing instead of reducing and ordering the data set at the database; failing to cache expensive queries and complex pages; excessive reliance on exception handling; too many callbacks to the server; refreshing whole pages instead of using Ajax to update data; inefficient data structures; excessive use of loops.
This just scratches the surface -- the list goes on. Don't forget to check your key performance indicators before you do a deep dive -- you can spend a long time looking for a database design problem when your real problem is that your logs are filling up your disk, or that your WAN connection is being overloaded by a denial-of-service attack.