Performance and memory profiling for MLflow runs with Sciagraph

MLFlow running too slowly, or using too much memory? You can get results faster—but only if you can find the bottlenecks and fix them.

Sciagraph can help: it can give you performance and memory profiling reports to help you identify the bottlenecks, and it comes with MLflow integration built in.

In order to Sciagraph with MLflow, you need to:

Ensure Sciagraph is installed in the project environment.
Make sure you have a Sciagraph account to run MLFlow.
Enable profiling.
Utilize MLflow’s Tracking API to keep track of profiling reports.

Note: If you sign up for the Team plan, you can use Sciagraph in production environments or automated jobs in general. Other plans only allow profiling in development environments like your laptop, or manually run projects.

Profiling your MLflow project

We’ll start with the case where you’re using a single run that spans the whole process lifetime.

Conda projects

Note that Sciagraph only runs on Linux; if you need to run on another operating system, you can use a Docker project (see below).

1. Ensuring Sciagraph is installed

Add the pip-based sciagraph package to your environment.yml or conda.yaml, however it’s named:

name: example
channels:
  - conda-forge
dependencies:
  - python=3.9
  - pip
  - pip:
    - sciagraph

2. Enabling profiling

In your MLproject, you can set up two variants of your entrypoint, one of which enables profiling:

name: example
conda_env: environment.yaml
entry_points:
  main:
    command: "python -m sciagraph run main.py"
  # Useful when you're running in an environment where Sciagraph won't run,
  # e.g. a developer laptop running Windows:
  main-no-profiling:
    command: "python main.py"

If you haven’t already, sign up for a free account and copy and run the command to store your access token:

python -m store_token ...

3. Utilize MLflow’s Tracking API

Since MLflow lets you store artifacts like generated reports, you can utilize that functionality to store the profiling output.

In your main script, you can run sciagraph.integrations.mlflow.install_handler() to include the Sciagraph report in the results tracked by MLflow.

from sciagraph.integrations.mlflow import install_handler

if __name__ == "__main__":
    install_handler()
    
    # ... run your code ...

Bringing it all together

Your project will now have profiling with Sciagraph on by default:

$ mlflow run -e main yourproject

See below for details on viewing the resulting report.

Docker projects

1. Ensuring Sciagraph is installed

Make sure the sciagraph package was installed inside your Docker image, for example:

FROM python:3.9-slim-bullseye

RUN pip install pandas matplotlib mlflow sciagraph
COPY . .
# ...

2. Enabling profiling

You will need to lookup the SCIAGRAPH_ACCESS_KEY and SCIAGRAPH_ACCESS_SECRET environment variables on the account page.

And in the MLproject file setting up the project, make sure those environment variables get passed to the Docker image, and make sure the default entrypoint uses Sciagraph. Again, providing an alternative without Sciagraph profiling may be handy at times.

name: example
docker_env:
  image: yourimage-with-sciagraph-installed
  # Replace 'your-key-here' and 'your-secret-here' with the values
  # for your account you can find on the https://account.sciagraph.com/ui/ page:
  environment:
    - ["SCIAGRAPH_ACCESS_KEY", "your-key-here"]
    - ["SCIAGRAPH_ACCESS_SECRET", "your-secret-here"]
entry_points:
  main:
    command: "python -m sciagraph run main.py"
  # Useful when you're running in an environment where Sciagraph won't run,
  # e.g. a developer laptop running Windows:
  main-no-profiling:
    command: "python main.py"

3. Utilize MLflow’s Tracking API

See above for details; this is the same as the way you would do it for a Conda project.

System projects

Profiling system projects with mlflow run is currently problematic, since the parent process will be the one that gets profiled. Instead, avoid mlflow run and just run the program directly using Sciagraph’s normal operation mode, e.g. SCIAGRAPH_MODE=process python yourprogram.py.

Profiling multiple runs separately

In some cases you’ll be using the MLflow APIs to launch multiple runs within the same process. You can choose to just profile them in one go, in which case you can just following the instructions above.

However, you may also wish to get a separate profiling report for every separate run. In this situation, you can use Sciagraph’s API mode.

First, make sure you’re using API mode instead of process mode. Depending how you enable Sciagraph:

Add the --mode=api option to your command-line, e.g. python -m sciagraph --mode=api run yourscript.py.
Switch from SCIAGRAPH_MODE=process environment variable to SCIAGRAPH_MODE=api.

You will now need to expliclity tell Sciagraph to profile your code. For example, if you have the following code now:

from mlflow import start_run
from yourcode import yourjob

def main():
    with start_run():
        yourjob()

if __name__ == '__main__':
    main()

To use Sciagraph, you need to:

Install Sciagraph support.
Wrap each run’s code with sciagraph.intergrations.mlflow.profile_job().

In this example we add the MLflow run ID to the Sciagraph job ID that will be included in the Sciagraph profiling report:

from mlflow import start_run
from yourcode import yourjob
# new imports:
from sciagraph.integrations.mlflow import install_handler, profile_job

def main():
    with start_run() as run:
        # profile with sciagraph:
        with profile_job("Yourjob: " + str(run.info.run_id)):
            yourjob()

if __name__ == '__main__':
    install_handler()  # <-- add Sciagraph support
    main()

You can use the profile_job() decorator as many times as you want; pass in the job ID you want to add to the report.

Viewing profiling results

Your project’s profiling output will be stored as an artifact called Sciagraph profiling report with the results of your run.

Open the report UI, for example by running mlflow ui, or visiting your hosted Tracking Server.
Download the sciagraph-report.zip file.
Unzip all the contents.
Open index.html in your browser.

If you open just the index.html and it doesn’t show the graphs, it’s probably because you didn’t extract all the files; you also need to extract the .svg files or the report won’t display.