Stress Testing FastAPI

Introduction

One of these random days I asked myself, how fast can a FastAPI application go? How many request per second can be achieved using a simple setup. I know request per second is not a measure of anything, particularly if I test it on my machine as opposed to a production environment. However, the doubt still stands, and I was set to not scientifically try to figure it out and learn something in the process.

I tried to find something to check the RPS of an application, and my first findings pointed me to Apache JMeter, but after some finagling with it I asked a friend and they pointed me to locust. Locust is an open source tool written in python that would get you results easier than JMeter ever would.

Tools out of the way, I would like to take you through the some of the code we’re going to be testing and then the results and the learnings we could derive from it.

Code and partial results

Ping test

For the code, let’s start with something super simple to stablish a baseline. Since the api and the meter will be running in the same computer I would like to know where does the RPS saturate.

Here’s the project structure for our little app you can find in the repo.

fast-api-load on  main 🐍
❯ tree
.
├── Dockerfile
├── README.md
├── compose.yaml
├── migrations
│   ├── 000001_initial.down.sql
│   └── 000001_initial.up.sql
├── pyproject.toml
├── src
│   └── example
│       ├── __init__.py
│       ├── api.py
│       ├── config.py
│       ├── exceptions.py
│       ├── models.py
│       ├── repositories.py
│       └── utils.py
└── uv.lock

4 directories, 14 files

From there, let’s define a simple ping endpoint to stablish a baseline,

# api.py

from fastapi import FastAPI
from example.models import PongModel

app = FastAPI()

@app.get("/ping/")
async def ping() -> PongModel:
    """Ping endpoint to check if the server is up and running."""
    return PongModel(ping="pong")

And then let’s run it using docker compose:

# compose.yaml

services:
  api:
    build: .
    command: ["uv", "run", "uvicorn", "--host", "0.0.0.0", "example.api:app"]
    ports:
      - "8000:8000"
    develop:
      # ...

Notice we’re running the app using uvicorn instead of FastAPI’s own development server. I tried using the dev server, and the results were just not fair to show.

Finally, to load test the application using locust, we’re going to need a locustfile.py,

# locustfile.py

from locust import HttpUser, task

class HelloWorldUser(HttpUser):
    """User class for the test."""

    @task
    def hello_world(self) -> None:
        """Task to test the hello world page."""
        self.client.get("/ping/")

And then we can run the test using the uvx locust command. Then I set the maximum concurrency to 200 and the load to scale at 20 concurrent threads a second. The first set of results are a bit interesting:

locust results

Here you can see that the RPS temporarily saturates at 2000 but then falls down and oscillates between 800 and 1200. I noticed my terminal was struggling to keep up with the logs. Then, I tried disabling the access logs from uvicorn adding the following command to the compose.yaml file:

# compose.yaml

services:
  api:
    build: .
    command:
      [
        "uv",
        "run",
        "uvicorn",
        "--host",
        "0.0.0.0",
        "--log-level",
        "error",
        "example.api:app",
      ]
    ports:
      - "8000:8000"
    # ...

Furthermore, I added more threads to locust to avoid saturating the capacity for it to send requests. From now on the command for it looks like uvx locust --processes 8. In my case I’m running a 12 thread CPU, uvicorn uses one, and I give 8 to locust, leaving 3 to power my desktop environment and other activities.

Interestingly enough, without the access logging, we got some other interesting results:

locust results no logging

Here we see that the RPS saturate at 3100 and no longer drops down. However, if we check the network performance of the computer during the test:

system load using docker

We notice that we are sending/receiving at 900KiB/s, which I think is the limit for my not impressive network card.

For completeness, let’s take docker out of the picture and see if we get better performance without using the network card. For that I can run the following command directly:

uv run uvicorn --log-level error example.api:app

With that we get significantly more performance out of the app:

locust results no docker

Saturating at about 4700 RPS. And as we can see, the system load doesn’t show any network activity:

system load without docker

At this point I was curious to see how a compiled language (go) would behave under the same test, so I wrote a simple web server using the gin framework, the code looks like:

// main.go

package main

import "github.com/gin-gonic/gin"

func main() {
	router := gin.New()
	router.GET("/ping/", func(c *gin.Context) {
		c.JSON(200, gin.H{
			"ping": "pong!",
		})
	})
	router.Run(":8000")
}

And I ran it with GIN_MODE=release go run ., here are the results:

locust results go

The results stabilize at 13000 RPS, however it seems like locust is just not capable of going faster than that given 8 threads. Producing the error:

CPU usage above 90%! This may constrain your throughput and may even give inconsistent response time measurements! See https://docs.locust.io/en/stable/running-distributed.html for how to distribute the load over multiple CPU cores or machines

Notice that for the ping test we can get away dropping docker, but for the database access test we’re going to need docker to host the database.

DB access test

Now with a baseline, let’s try a more complicated test: I added more endpoints to create and list “tasks” to the server.

# api.py

app = FastAPI()


async def get_config() -> Config:
    """Get the configuration."""
    return Config()  # type: ignore


async def get_db_connection(
    config: Annotated[Config, Depends(get_config)],
) -> AsyncGenerator[AsyncConnection, None]:
    """Get a database connection."""
    async with await AsyncConnection.connect(str(config.db_dsn)) as conn:
        yield conn


async def get_task_repository(
    conn: Annotated[AsyncConnection, Depends(get_db_connection)],
) -> TaskRepository:
    """Get a task repository."""
    return TaskRepository(conn)


@app.get("/tasks/")
async def list_tasks(
    repo: Annotated[TaskRepository, Depends(get_task_repository)],
    after: str | None = None,
    limit: Annotated[int, Query(ge=1, lte=100)] = 25,
) -> PaginatedTasks:
    """List tasks."""
    _after = 0
    if after is not None:
        _after = decode_next_id(after)
    return await repo.list(_after, limit)


@app.post("/tasks/", status_code=201)
async def create_task(
    repo: Annotated[TaskRepository, Depends(get_task_repository)],
    payload: CreateTaskPayload,
) -> Task:
    """Create a task."""
    return await repo.create(payload)

As you can see, this example “works” but it’s a little silly, on every request we create a Config object, stablish a database AsyncConnection and create a TaskRepository object. The silly stuff is mostly the connection which is expensive to stablish and the configuration that never changes so it could be generated just once.

Using postman I created a few tasks, enough to fill at least a page of the task list endpoint and then changed locustfile.py to:

# locustfile.py

from locust import HttpUser, task

class HelloWorldUser(HttpUser):
    """User class for the test."""

    @task
    def hello_world(self) -> None:
        """Task to test the hello world page."""
        self.client.get("/tasks/")

And then run the test with uvx locust --processes 8 so that we don’t get the 90% CPU usage issue. This yields again interesting results:

locust results

Here we can see, that we start to saturate a bit above 200 RPS (who knows if this is respectable). But after a few hundred connections the server crashes with:

  File "/app/src/example/api.py", line 24, in get_db_connection
    async with await AsyncConnection.connect(str(config.db_dsn)) as conn:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/psycopg/connection_async.py", line 135, in connect
    raise last_ex.with_traceback(None)
psycopg.OperationalError: connection failed: connection to server at "172.22.0.2", port 5432 failed: FATAL:  sorry, too many clients already

psycopg is is running into an issue where there are too many clients already connecting to the database. So it would be a better strategy, to have a pool of available connections and use them through the application, instead of creating one per request. Lucky for us psycopg_pool does just that.

Let’s adjust the code to use a pool instead:

# api.py

config = Config()  # type: ignore

pool = AsyncConnectionPool(str(config.db_dsn), open=False)


@asynccontextmanager
async def lifespan(_: FastAPI) -> AsyncGenerator[None, None]:
    """Open and close the connection pool."""
    await pool.open()
    yield
    await pool.close()


app = FastAPI(lifespan=lifespan)


async def get_db_connection() -> AsyncGenerator[AsyncConnection, None]:
    """Get a database connection."""
    async with pool.connection() as conn:
        yield conn

# ...

First we create the Config object once (sad that it becomes a global, but it should be fair play inside api.py). And we create a connection pool, and bind its opening and closing to the lifetime of the app.

Then the get_db_connection dependency can just grab a connection from the pool and run with it.

We run the same locust file with the same command, and then get these results:

locust results

Not only we saturate at a much more reasonable 900 RPS, but we do away with the errors.

For peace of mind, neither locust nor the network card were a limiting factor on this result.

system load for tasks

Learnings

There are a few key learnings from this process:

RPS numbers don’t mean much on their own, but there’s something to be learnt from seeing your application crash, or dramatically change the RPS it can serve.
Logging might impact the performance of your server, specially when it’s doing “nothing”, like not connecting to a database or doing external requests that might slow the endpoint.
Pooling your resources is important in concurrent applications, I knew they were, but there’s something guttural about seeing things just crash.