Calculating HTTP Server Load


Background

When discussing and specifying performance for HTTP-based applications, it is very important to use the correct terminology and metrics. Many people simply talk about “users” or “concurrent users” in a way which means very little; for example, a typical requirement is “we need to support 100 users”, with no further information given.

This post aims to provide clarity on the subject of HTTP server load, and demonstrates how to calculate the key metrics which allow system architects to plan capacity and estimate the scaling needed for a system.

Requests per second

The key metric in this case is Requests per second, which typically results from specification of:

  • Number of users per hour
  • Number of concurrent users in a given time window

Key Definitions

The number of Requests per second reaching an HTTP application is dependent on several parameters:

  • The duration of the whole Test (aka the “Test Window”) (W)
  • The duration of a single Journey (aka the “Journey Window”) (J)
  • The number of users active within the Test Window (Y)
  • The number of users active within a single Journey Window (U)
  • This is also the number of “concurrent users”
  • The number of steps in the Journey (S)
  • The number of HTTP requests per step of the Journey (R)

Journey

In this case, a “Journey” is a typical sequence of “actions” taken by the HTTP client connecting to the HTTP server (whether this client is a human user with a web browser, or a SOAP web service client calling a web service). The Journey may involve several “steps”, which often (but not always) map to distinct web pages. Each step will trigger one or more HTTP requests. For example, a typical web page will contain references to images, CSS and JavaScript files, etc., which the browser requests from the server; each of these files (if not already cached) results in an additional HTTP request. A web service, on the other hand, would typically have only a single request per step.

Diagram

This diagram shows the relationships:

|                                                  |  Test Window (W)
|__________________________________________________|

|                                                  |  Journey Window (J)
|_------_------_------_------_------_------_------_|  here, 7 Journeys fit in 1 Test Window

| O O O O O O O O O O O O O O O O O O              |  Number of users per Test Window (Y)
|_T_T_T_T_T_T_T_T_T_T_T_T_T_T_T_T_T_T______________|  here there are 18 users

| O O O                                            |  Number of users per Journey Window (U)
|_T_T______________________________________________|  here there are 2.57 users: U = Y * (J/W)

S              S              S                       Number of steps in the Journey (S)
-----------------------------------------------       here, there are 3 steps

S   R   R      S   R   R      S   R   R               Number of requests in each step of the Journey (R)
-----------------------------------------------       here, there are 2 requests per step

Basic Calculations

Given the definitions above, we can derive the Requests per second in two different ways, depending on what performance criterion has been mandated:

  • Users per hour
  • Number of concurrent users

Note: in the calculations below, the following are defined:

  • Requests per second (H)
  • Requests per Journey (P)
  • Requests per Journey Window (Q)
U = Y * (J/W)

Requests per second, given users per hour

H     =       Q             /  J
      =     P * U           /  J
      = S * R * U           /  J
      = S * R * (Y * (J/W)) /  J
      = S * R * Y/W

Example:

  • Requirement: 20,000 users per hour (Y)
  • The Journey is 30 seconds long (J)
  • There are 3 steps in the Journey (S)
  • There are 2 HTTP requests at each step of the Journey (R)
  • Requests per second, H = 3 * 2 * 20000/(60*60) = 33.333 Requests per second.

Requests per second, given concurrent users

H     =       Q             /  J
      =     P * U           /  J
      = S * R * U           /  J

This allows us to derive the number of users which can be supported in the Test Window.

Example:

  • Requirement: support 160 concurrent users (U)
  • The Journey is 30 seconds long (J)
  • There are 3 steps in the Journey (S)
  • There are 2 HTTP requests at each step of the Journey (R)
  • Requests per second, H = 3 * 2 * 160 / 30 = 32 Requests per second.
  • Users per hour, Y = U * W / J = 19,200 users per hour

Seconds per Request

To calculate the supported page load/serve time (L), we need to know the number of HTTP servers responding to requests (i.e. in a load-balanced system) (V), and the number of concurrent requests each server can sustain (Q).

We spread the load across the servers, and give each server its maximum number of concurrent connections:

L = (1 / H) * V * Q

Example:

  • Requests per second: 30 (H)
  • Load-balanced servers: 3 (V)
  • Concurrents requests per server: 20 (Q)
  • Seconds per Request (per server per concurrent connection), L = (1 / H) * V * Q = (1 / 33.333) * 3 * 22 = 2 seconds

Estimating Capacity

Having derived the “raw” values for Requests per second and therefore seconds per Request, we can apply a scaling factor which gives an indication of the amount of scale-out or scale-up necessary to acheive the required performance.

Define:

  • h: Requests per second supported by a “single unit” (e.g. one web server, one database server)
  • l: seconds per request supported by a “single unit” (e.g. one web server, one database server)
  • M: number of “single units” required

Therefore:

  • H/h is a “saturation” factor for Requests per second
  • l/L is a “saturation” factor for seconds per Request

This allows us to calculate what multiple of the “single unit” would be needed in order to support the required performance.

M = H / h = l /L

Example:

  • Requirement: 33.333 Requests per second (H)
  • “Single unit” supports 7 Requests per second (h)
  • Number of “single units”, M = H / h = 33.333/7 = 4.76 units

Example:

  • Requirement: 0.03 seconds per Request (L)
  • “Single unit” supports 0.16 seconds per Request (l)
  • Number of “single units”, M = l / L = 0.16/0.03 = 5.33 units

Increasing Performance

Having derived M, the scaling factor, you need to decide what increased system specification will meet this. You should consider:

  • Scale-out:
    • Web server load balancing
    • Database mirroring, sharding, or other database performance enhancements
  • Scale-up:
    • Faster servers
    • Faster bandwidth
    • Faster disks or I/O
  • Architecture changes:
    • Caching
    • Files e.g. Squid configured as an Accelerator
    • Page fragments e.g. ASP.NET Output Caching
    • Content Delivery Network (CDN) for media hosting

Note: some of these changes will affect the initial calculations. For example, if you offload media to a CDN, the number of Requests per second for a given journey will drop.

Your choice of scaling point depends on the bottlenecks in the system. If your application produces many small requests to the database, consider scaling out the database servers; if the queries are CPU-intensive on the database, consider scaling up the database. If your application receives many small-to-medium size requests at the web tier, scale out the web servers. If the web tier spends a significant time processing data (e.g. zipping files), then you would scale up the web tier, and so on.

Benchmarking

To decide how to scale, we need a benchmark for HTTP server performance. Perhaps the simplest HTTP server (apart from special cases such as Squid) will take a request for one of a set of static images cached in memory, and serve back that image.

In this case, we will assume that the application is CPU-bound, and therefore that scaling the number and/or speed of CPUs will improve performance.

SimplestHttpServlet

The following Java pseudo-code demonstrates a simple HTTP server which retrieves data from an in-memory cache:

public abstract class SimplestHttpServlet extends HttpServlet
{
	/**
	* Service HTTP GET request.
	*/
	public void doGet(HttpServletRequest request, HttpServletResponse response)
			throws ServletException, IOException
	{
	
		// get content path from URL
		String path = request.getParameter("path");
		Content content = getContent(Cache.instance(), path, response);
		if(content == null) return;

		response.reset();
		response.setContentType(content.getMimeType());

		response.getOutputStream().write(content.getData());
		response.getOutputStream().flush();
	}
	
}

Baseline

Using the SimplestHttpServlet, we can run some tests on baseline hardware. Using the CPU comparisons at http://www.cpubenchmark.net/cpu_list.php, we can establish a CPU PassMark for the test system. Let’s say we run the tests on a single-CPU Intel Pentium M @ 1400MHz; this has a PassMark score of 320. We run the performance tests (requesting one of say 200 ~32KB images at random), and determine that this system can support the following:

  • 200 concurrent users
  • 32 Requests per second

Assuming that this is a CPU-bound application, we would expect to see the performance scale with the PassMark value, so a system with a PassMark value of 610 (e.g. Intel Xeon 3.00GHz) to be able to support around 61 requests per second.

However, if we change the application profile to have different performance characteristics (e.g. I/O or database bound), a different scaling metric would need to be applied. In particular, disk and I/O subsystems, network bandwidth, and other considerations would need to be taken into account.

Further reading

Notes

Thanks to Adam Ross for his contribution.

Leave a Comment

(required)