Performance and memory profiling for Celery tasks with Sciagraph
Celery tasks running too slowly, or using too much memory? You can get results faster—but only if you can find the bottlenecks and fix them.
Sciagraph can help: it can give you performance and memory profiling reports to help identify the bottlenecks, and it comes with Celery integration built in.
In order to use Sciagraph with Celery, you need to:
- Ensure Sciagraph is installed and activated in the project environment.
- Add profiling to your Celery tasks.
- Enable profiling on your Celery workers.
- Read the resulting reports, and use them to find the bottleneck.
Note: If you sign up for the Team plan, you can use Sciagraph in production environments. Other plans only allow profiling in development environments like your laptop.
1. Installing and setting up Sciagraph
Supported operating systems:
- Linux on x86_64, Python 3.7-3.11 (ARM can be made available if there’s interest; email me).
- macOS with Python 3.9-3.11, Catalina (10.15) or later, on both x86_64 and ARM.
- Windows is not currently supported directly, so you need to use Linux emulation of some sort:
- WSLv2 is likely to work, but still untested.
- Or, you can use Docker to start a Linux-based shell prompt:
docker run -it -v python:3.11 bash
To install:
- Install Sciagraph in the environment where Celery is running by doing
pip install sciagraph
(or adding it to yourrequirements.txt
/pyproject.toml
/etc.). - Sign up for a Sciagraph account.
- Store the access token, using the command you’ll find on the account page:
python -m sciagraph.store_token ...
See the documentation on the basics of using Sciagraph for a more detailed guide.
2. Adding profiling to your Celery tasks
If you have a tasks.py
that looks like this:
from celery import Celery
app = Celery("tasks", broker="pyamqp://guest@localhost//")
@app.task
def generate_report(x, y):
# ... do some work ...
return x + y
You can add Sciagraph performance report generation to that task by using the sciagraph.integrations.celery.profile
decorator:
from celery import Celery
from sciagraph.integrations.celery import profile
app = Celery("tasks", broker="pyamqp://guest@localhost//")
@app.task
@profile # <-- add decorator
def generate_report(x, y):
# ... do some work ...
return x + y
3. Enabling profiling
Once you’ve made sure Sciagraph is enabled on your tasks, you need to make sure your workers have Sciagraph enabled. Sciagraph supports prefork / process pools, and solo mode.
Prefork / process pools
When using a process pool (“prefork”), you enable Sciagraph by setting the usual SCIAGRAPH_ACCESS_KEY
and SCIAGRAPH_ACCESS_SECRET
environment variables, as well as two additional environment variables.
$ export SCIAGRAPH_ACCESS_KEY="...get real value from your account..."
$ export SCIAGRAPH_ACCESS_SECRET="...get real value from your account..."
$ export SCIAGRAPH_MODE=celery
$ export SCIAGRAPH_CELERY_REPORTS_PATH=/home/app/sciagraph-reports
$ celery -A tasks worker --pool prefork
The path passed to SCIAGRAPH_CELERY_REPORTS_PATH
is where reports will be stored, in subdirectories based on the task name and individual tasks’ unique ID.
In the example above, if you have a generate_artifact
task in tasks.py
, you will end up with profiling reports in /home/app/sciagraph-reports/tasks.generate_artifact/<task ID>
.
Solo
You can also use Sciagraph with a worker that just runs one task at a time, “solo” mode.
This is similar to the configuration above, except you use a different SCIAGRAPH_MODE
, namely api
:
$ export SCIAGRAPH_ACCESS_KEY="...get real value from your account..."
$ export SCIAGRAPH_ACCESS_SECRET="...get real value from your account..."
$ export SCIAGRAPH_MODE=api
$ export SCIAGRAPH_CELERY_REPORTS_PATH=/home/app/sciagraph-reports
$ celery -A tasks worker --pool solo
4. Reading the reports
There are two ways to read the reports:
- Download the generated reports from Sciagraph’s cloud storage service.
- Read locally stored copies of the reports.
Downloading reports
By default, Sciagraph will upload end-to-end encrypted copies of the reports to its cloud storage server. Instructions on how to download these reports will be output in the worker’s logs. Anyone with access to the logs will be able to download and view the reports from any computer with Python installed.
For example, here’s what the logs might look like:
$ Export SCIAGRAPH_MODE=api
$ celery -A tasks worker --pool solo
...
[2022-07-19 13:45:04,305: WARNING/MainProcess] Successfully uploaded the Sciagraph profiling report.
Job start time: 2022-07-19T17:45:03+00:00
Job ID: celery_tasks.add/e09f0ca3-a930-4462-9879-bf38e19ccea4
The report was stored locally at path /tmp/reports/celery_tasks.add/e09f0ca3-a930-4462-9879-bf38e19ccea4
An encrypted copy of the report was uploaded to the Sciagraph storage server.
To download the report, run the following on Linux/Windows/macOS, Python 3.7+.
If you're inside a virtualenv:
pip install --upgrade sciagraph-report
Otherwise:
pip install --user --upgrade sciagraph-report
Then:
python -m sciagraph_report download 907e57c4-23d4-4237-88db-4a5da04a9d65 1/Te9N2ZNqlBREWWtngiu7DN25hyNN/RIvh7QkgmtOEbpWyTVwdn
Follow those instructions, and you can view the report.
Reading locally-stored reports
Sciagraph will also store the reports locally, on the machine running the worker.
Specifically, it will store them in the directory specified by SCIAGRAPH_CELERY_REPORTS_PATH
.
For example, if SCIAGRAPH_CELERY_REPORTS_PATH=/tmp/reports
, after running the add()
task we’ll see:
$ ls /tmp/reports/
celery_tasks.add
$ ls /tmp/reports/celery_tasks.add/
e09f0ca3-a930-4462-9879-bf38e19ccea4
$ ls /tmp/reports/celery_tasks.add/e09f0ca3-a930-4462-9879-bf38e19ccea4/
index.html peak-memory.prof peak-memory-reversed.svg peak-memory.svg performance performance.prof performance-reversed.svg performance.svg
Open index.html
in your browser to see the report.
5. Bonus: Making sure old reports are cleaned up
When Sciagraph is enabled, every task with profiling enabled will write out a report. By default, only the last 1000 reports are kept.
To keep more, set the SCIAGRAPH_CELERY_MAX_REPORTS
environment variable before starting the worker, for example:
$ export SCIAGRAPH_CELERY_MAX_REPORTS=5000
$ export SCIAGRAPH_MODE=celery
$ celery -A tasks worker --pool=prefork