Performance and memory profiling for MLflow runs with Sciagraph
MLFlow running too slowly, or using too much memory? You can get results faster—but only if you can find the bottlenecks and fix them.
Sciagraph can help: it can give you performance and memory profiling reports to help you identify the bottlenecks, and it comes with MLflow integration built in.
In order to Sciagraph with MLflow, you need to:
- Ensure Sciagraph is installed in the project environment.
- Make sure you have a Sciagraph account to run MLFlow.
- Enable profiling.
- Utilize MLflow’s Tracking API to keep track of profiling reports.
Note: If you sign up for the Team plan, you can use Sciagraph in production environments or automated jobs in general. Other plans only allow profiling in development environments like your laptop, or manually run projects.
Profiling your MLflow project
We’ll start with the case where you’re using a single run that spans the whole process lifetime.
Conda projects
Note that Sciagraph only runs on Linux; if you need to run on another operating system, you can use a Docker project (see below).
1. Ensuring Sciagraph is installed
Add the pip
-based sciagraph
package to your environment.yml
or conda.yaml
, however it’s named:
name: example
channels:
- conda-forge
dependencies:
- python=3.9
- pip
- pip:
- sciagraph
2. Enabling profiling
In your MLproject
, you can set up two variants of your entrypoint, one of which enables profiling:
name: example
conda_env: environment.yaml
entry_points:
main:
command: "python -m sciagraph run main.py"
# Useful when you're running in an environment where Sciagraph won't run,
# e.g. a developer laptop running Windows:
main-no-profiling:
command: "python main.py"
If you haven’t already, sign up for a free account and copy and run the command to store your access token:
python -m store_token ...
3. Utilize MLflow’s Tracking API
Since MLflow lets you store artifacts like generated reports, you can utilize that functionality to store the profiling output.
In your main script, you can run sciagraph.integrations.mlflow.install_handler()
to include the Sciagraph report in the results tracked by MLflow.
from sciagraph.integrations.mlflow import install_handler
if __name__ == "__main__":
install_handler()
# ... run your code ...
Bringing it all together
Your project will now have profiling with Sciagraph on by default:
$ mlflow run -e main yourproject
See below for details on viewing the resulting report.
Docker projects
1. Ensuring Sciagraph is installed
Make sure the sciagraph
package was installed inside your Docker image, for example:
FROM python:3.9-slim-bullseye
RUN pip install pandas matplotlib mlflow sciagraph
COPY . .
# ...
2. Enabling profiling
You will need to lookup the SCIAGRAPH_ACCESS_KEY
and SCIAGRAPH_ACCESS_SECRET
environment variables on the account page.
And in the MLproject
file setting up the project, make sure those environment variables get passed to the Docker image, and make sure the default entrypoint uses Sciagraph.
Again, providing an alternative without Sciagraph profiling may be handy at times.
name: example
docker_env:
image: yourimage-with-sciagraph-installed
# Replace 'your-key-here' and 'your-secret-here' with the values
# for your account you can find on the https://account.sciagraph.com/ui/ page:
environment:
- ["SCIAGRAPH_ACCESS_KEY", "your-key-here"]
- ["SCIAGRAPH_ACCESS_SECRET", "your-secret-here"]
entry_points:
main:
command: "python -m sciagraph run main.py"
# Useful when you're running in an environment where Sciagraph won't run,
# e.g. a developer laptop running Windows:
main-no-profiling:
command: "python main.py"
3. Utilize MLflow’s Tracking API
See above for details; this is the same as the way you would do it for a Conda project.
System projects
Profiling system projects with mlflow run
is currently problematic, since the parent process will be the one that gets profiled.
Instead, avoid mlflow run
and just run the program directly using Sciagraph’s normal operation mode, e.g. SCIAGRAPH_MODE=process python yourprogram.py
.
Profiling multiple runs separately
In some cases you’ll be using the MLflow APIs to launch multiple runs within the same process. You can choose to just profile them in one go, in which case you can just following the instructions above.
However, you may also wish to get a separate profiling report for every separate run. In this situation, you can use Sciagraph’s API mode.
First, make sure you’re using API mode instead of process mode. Depending how you enable Sciagraph:
- Add the
--mode=api
option to your command-line, e.g.python -m sciagraph --mode=api run yourscript.py
. - Switch from
SCIAGRAPH_MODE=process
environment variable toSCIAGRAPH_MODE=api
.
You will now need to expliclity tell Sciagraph to profile your code. For example, if you have the following code now:
from mlflow import start_run
from yourcode import yourjob
def main():
with start_run():
yourjob()
if __name__ == '__main__':
main()
To use Sciagraph, you need to:
- Install Sciagraph support.
- Wrap each run’s code with
sciagraph.intergrations.mlflow.profile_job()
.
In this example we add the MLflow run ID to the Sciagraph job ID that will be included in the Sciagraph profiling report:
from mlflow import start_run
from yourcode import yourjob
# new imports:
from sciagraph.integrations.mlflow import install_handler, profile_job
def main():
with start_run() as run:
# profile with sciagraph:
with profile_job("Yourjob: " + str(run.info.run_id)):
yourjob()
if __name__ == '__main__':
install_handler() # <-- add Sciagraph support
main()
You can use the profile_job()
decorator as many times as you want; pass in the job ID you want to add to the report.
Viewing profiling results
Your project’s profiling output will be stored as an artifact called Sciagraph profiling report
with the results of your run.
- Open the report UI, for example by running
mlflow ui
, or visiting your hosted Tracking Server. - Download the
sciagraph-report.zip
file. - Unzip all the contents.
- Open
index.html
in your browser.
If you open just the index.html
and it doesn’t show the graphs, it’s probably because you didn’t extract all the files; you also need to extract the .svg
files or the report won’t display.