Imagine you are building a new web application. The front end is a single page app and the back end is a server with a REST API that uses a relational database as the primary data store. Pretty standard stuff. Your team is excited about this new application, and all of the focus is on building the new cool features. Then halfway through your project’s timeline, your boss comes in and says, “We need to do performance testing. I’ve talked to the first customer who will be using this application, and they need it to be able to handle 100,000 simultaneous users. Can we handle that?”
I’ve been in that situation several times before and this blog post contains an approach that has worked for me. “Performance testing” is really load testing, performance tuning, and operational monitoring. All three are big efforts and like most things are done iteratively – build a little bit, try it out, refine it, repeat.
Step 1: Model the load
The goal is to model how people use your application, and translate that into a certain load. For a server, load is usually measured in requests/sec or transactions/sec. The statement “the system needs to be able to handle 100,000 users” doesn’t give you enough information to make a load calculation, but here are some steps that will:
1.1. Define a script or multiple scripts that list out a typical user session. Answer these questions:
- How does a person typically use your application in a single session? What are all the interactions?
- How long is a typical session? 2 minutes? 20 minutes? 2 hours?
- How many times per day / week / month?
- What times of day?
- Is usage higher on certain days e.g. Black Friday or the end of the month / quarter / year?
For example, for a banking system typical usage might be, “Login, click the accounts tab, select an account, click on the transaction history for that account”, and people do that in the morning or at lunch 3 times per week consistently all year (no seasonality). If this is a new application, you won’t know for sure how users will really use the system, so you have to make assumptions. Ultimately your test results will only be as good as your assumptions, so try to validate the expected user behavior by talking to whoever is the best advocate for the customer at your company or doing testing with real people. Don’t just wing it.
1.2 Figure out how many server calls each step of the typical user session will take. One user interaction will probably require at least one server call, but it might be more.
1.3 Calculate how many server calls per second a single user session will take. Using the banking system example above, the values are:
Total session time: 2 minutes
Server calls: 1 login, 1 get all accounts, 1 get single account, 1 get transaction history
Load per session in requests/sec (in this case 1 call / 120 secs): 0.008 login, 0.008 get accounts, 0.008 get transaction history
1.4 Calculate load based on number of sessions
Now that you know the load generated by a single session, you can multiple by the number of sessions you are expecting. Continuing with the example, here is what the math looks like. Makes it easier to keep all durations in seconds:
Sessions per day: 100,000
Duration per session: 120 seconds
Seconds per day: 86,400
Non-parallel sessions per day: 86,400 / 120 = 720
Required session parallelism: 100,000 / 720 = 139 sessions running in parallel
Load in requests/sec: 139 sessions * 0.008 logins / sec / session = 1.112 logins /sec
However taking expected daily load and distributing it equally throughout the day is probably not realistic for most applications. Decide how to distribute load based on your application’s expected usage pattern. For banking apps, usage spikes in the morning, around lunch and late afternoon. Presumably people are working during the rest of the day, and sleeping at night. A simplistic way to model this is simply to fit all of your expected daily usage into 12 or 16 hours vs 24 hours.
Based on your modeling, you want end up with at least 2 values: normal load and peak load. Also I’d suggest you get someone good at spreadsheets to check your math.
Step 2: Set API call service level objectives (SLO)
For each server API call, determine how quickly it must respond to each request. “As fast as possible” is not a measurable value. Does the server need to respond in 1/2 second, or is longer OK for certain requests? At what load? Often this is expressed in percentiles such as, “99% of the login requests must respond in 500ms when the system is under load of 100 logins/sec”. This will be used to determine whether your tests pass or fail.
It is also good to specify what should happen when the load exceeds peak load. For example, if system should be able to handle 100/login/sec without slowing down, what happens when load is higher than that? Should the response time just get longer while the system chugs through the excess requests, or should it reject all new requests once it starts to slow down?
Step 3: Design the tests
Here are some types of tests to consider:
- peak load: peak load for 30 minutes; goal is to see if the system can handle peak load and still meet SLOs
- burst test: 2x peak load for 15 minutes, normal load for 15 minutes; goal is see if system can handle more than peak load and recover after a little while
- short endurance test: normal load for 24 hours
- long endurance test: normal load for several days
- breakpoint test: start at normal load for 30 minutes, then 2x normal for 30 mins, then 3x, etc until the system can’t recover; goal is find out how much load your system can handle
One important consideration for any system with a relational database is that databases perform very different when they are full vs empty. Be sure to load your test database with 6-12 months worth of expected data prior to testing to find out what the performance of your system will be like in 6-12 months. Avoid testing with a relatively empty database which will give you overly positive results.
Step 4: Select a load generator, implement and run the tests
There are many load generators to choose from. The one I like is Apache JMeter. Yes, the user interface hasn’t changed since the late 90’s, but it is open source, mature, has a lot of functionality via plugins, and works with several cloud testing vendors such as Red Line 13 or Blazemeter.
Step 5: Analyze the results, tune your system, rinse & repeat
The load generator tool can tell you how long the server took to respond for each request, but you don’t want to have to rely on that because you won’t have it in production. You want to make sure your system is instrumented to send performance-related metrics to a time series database. Here you have 2 options: use a commercial application performance monitoring (APM) tool like NewRelic or AppDynamics or set up your own monitoring using open source tools. I’ve done both. The APMs (I’ve specifically used NewRelic) give you a ton of useful diagnostic information but can be expensive. However time = money, and since they can save you a lot of time not building it yourself, they can actually be cheaper.
Implementing your own basic monitoring is not hard, and obviously allows you to customize your solution. You want to instrument (e.g. time) each call to your API endpoints, and store the call’s duration in a time series database such as Graphite or InfluxDB. Then use a graphing dashboard like Grafana to see the results, and set up alerts when performance falls outside your SLOs.
One tip while analyzing performance data is to look at percentiles and not averages. Averages hide what is really going on while percentiles show performance problems much more clearly. See “Why Averages Suck and Percentiles are Great” for more info on that topic.
Step 6: Automate, automate, automate
Load testing is a giant time suck. However long you think it will take, it will take 100x’s longer. Automate as much as possible. I suggest hooking performance testing into your continuous integration pipeline. You probably don’t want to run a 30 minute peak load test after every code commit because that will slow down developers. However running a load test every day with an automated job, and having it compare the performance of today’s test with the moving average of the last month will give you immediate feedback on whether the performance of your system is getting better or worse. Configure the job to fail if performance is significantly worse than the moving average, and you have just achieved fully automated performance testing which is a lofty goal that is often desired but rarely achieved. Congrats!