For each measure, two graphs are shown. In both cases, red means things got worse compared to the base profiling report, and blue means things got better.
The first graph uses the flamegraph from the changed report, and therefore is able to show callstacks that don't exist in the base report. The second graph uses the flamegraph from the original report, and can show callstacks that don't exist in the changed report. Again, in both cases blue means improvement from base report → changed report, and red means things got worse. The difference is just which callstacks can be shown while preserving the flamegraph structure.
Note that for performance you may need to take into account parallelism from multi-threading or multiprcoessing, which is something flamegraphs aren't great at visualizing.