gprof2dot

Converts profiling output to a dot graph.

3,345

394

3,345

View on GitHub

Top Related Projects

pprof

8,540

pprof is a tool for visualization and analysis of profiling data

pyflame

2,985

🔥 Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.

Quick Overview

gprof2dot is a Python script that converts profiling output from various tools into a dot graph. It visualizes the performance data of programs, making it easier to identify bottlenecks and optimize code. The tool supports multiple profiling formats and can generate graphical representations of execution time and call relationships.

Pros

Supports multiple profiling formats (gprof, VTune, OProfile, and more)
Generates easy-to-understand graphical representations of profiling data
Customizable output with various options for filtering and styling
Can be used as a standalone script or integrated into other tools

Cons

Requires GraphViz to be installed for generating the final graph image
May have a learning curve for users unfamiliar with profiling concepts
Limited to static analysis of profiling data, not real-time visualization
Might produce complex graphs for large programs, requiring manual interpretation

Code Examples

Basic usage with gprof output:

from subprocess import call

call(["gprof2dot", "-f", "gprof", "gmon.out", "-o", "output.dot"])
call(["dot", "-Tpng", "output.dot", "-o", "profile.png"])

Using gprof2dot as a module to customize output:

import gprof2dot

parser = gprof2dot.PstatsParser("profile.pstats")
profile = parser.parse()

dot = gprof2dot.DotWriter(
    theme=gprof2dot.TEMPERATURE_COLORMAP,
    node_threshold=0.5,
    edge_threshold=0.1
)
dot.graph(profile, "custom_output.dot")

Filtering nodes based on a specific function:

import gprof2dot

parser = gprof2dot.PstatsParser("profile.pstats")
profile = parser.parse()

def node_filter(node):
    return "important_function" in node.function

dot = gprof2dot.DotWriter(node_filter=node_filter)
dot.graph(profile, "filtered_output.dot")

Getting Started

Install gprof2dot and GraphViz:

pip install gprof2dot
# Install GraphViz using your system's package manager

Profile your program using a supported profiler (e.g., gprof, cProfile)

Convert the profiling output to a dot file:

gprof2dot -f <format> <profile_data> -o output.dot

Generate the final graph image:
```
dot -Tpng output.dot -o profile.png
```
View the resulting profile.png to analyze your program's performance

Competitor Comparisons

FlameGraph

18,392

Stack trace visualizer

Pros of FlameGraph

Provides a more intuitive visualization of performance data with flame graphs
Supports a wider range of profiling data formats and languages
Offers interactive SVG output for easier analysis and exploration

Cons of FlameGraph

Requires more setup and data preprocessing compared to gprof2dot
May have a steeper learning curve for users unfamiliar with flame graphs
Less suitable for traditional call graph representations

Code Comparison

FlameGraph:

#!/usr/bin/perl -w
use strict;
open(FILE, "$ARGV[0]") or die "Can't read $ARGV[0]: $!\n";
while (<FILE>) {
    chomp;

gprof2dot:

#!/usr/bin/env python
import sys
import math
import os.path
import re
import textwrap
import optparse

FlameGraph uses Perl for its main script, while gprof2dot is written in Python. FlameGraph's code focuses on processing input data for flame graph generation, whereas gprof2dot's code handles various command-line options and graph generation tasks.

Both tools are valuable for performance analysis, with FlameGraph excelling in intuitive visualization and gprof2dot offering more traditional call graph representations. The choice between them depends on the specific needs of the project and the preferred visualization style.

pprof

8,540

pprof is a tool for visualization and analysis of profiling data

Pros of pprof

More comprehensive profiling tool with support for multiple languages (Go, C++, Java)
Offers interactive web-based visualization and analysis
Integrates well with Google Cloud Profiler for continuous profiling

Cons of pprof

Steeper learning curve due to more complex features
May be overkill for simple profiling tasks
Requires additional setup for non-Go languages

Code Comparison

gprof2dot:

parser = gprof_parser.GprofParser(sys.argv[1])
profile = parser.parse()
dot = output.DotOutput(profile)
dot.render(sys.stdout)

pprof:

import "github.com/google/pprof/profile"

f, _ := os.Open("profile.pb.gz")
p, _ := profile.Parse(f)

Summary

gprof2dot is a simpler, more focused tool for converting profiling data to DOT graph format, primarily for C/C++ programs. It's easier to use for basic profiling tasks but has limited features.

pprof is a more powerful and versatile profiling tool, offering advanced features and support for multiple languages. It provides interactive visualizations and integrates well with cloud services, but may be more complex for simple profiling needs.

Choose gprof2dot for quick, straightforward profiling graph generation, and pprof for more comprehensive profiling and analysis across different languages and environments.

coz

4,289

Coz: Causal Profiling

Pros of coz

Provides causal profiling, offering insights into potential performance improvements
Supports multi-threaded applications and can identify complex performance bottlenecks
Offers a more comprehensive analysis of program behavior and optimization opportunities

Cons of coz

More complex setup and usage compared to gprof2dot
Limited language support (primarily C/C++)
May introduce overhead during profiling, potentially affecting performance measurements

Code comparison

gprof2dot:

def main():
    parser = optparse.OptionParser(
        usage="\n\t%prog [options] [file] ...")
    parser.add_option('-o', '--output', metavar='FILE',
                      help='output filename [stdout]')
    parser.add_option('-n', '--node-thres', metavar='PERCENTAGE',
                      type="float", dest="node_thres", default=0.5,
                      help='eliminate nodes below this threshold [default: %default]')

coz:

int main(int argc, char** argv) {
  std::string output_file = "/tmp/profile.coz";
  size_t fixed_line = 0;
  float speedup = 1.01;
  
  coz::begin(output_file, fixed_line, speedup);
  // ... rest of the program
}

gperftools

8,771

Main gperftools repository

Pros of gperftools

Comprehensive suite of performance tools including CPU profiler, heap profiler, and heap checker
Low-overhead profiling suitable for production environments
Supports multiple programming languages and platforms

Cons of gperftools

More complex setup and integration compared to gprof2dot
Steeper learning curve for beginners
May require code modifications for optimal usage

Code comparison

gprof2dot (Python):

def main():
    parser = optparse.OptionParser(
        usage="\n\t%prog [options] [file]")
    parser.add_option('-o', '--output', metavar='FILE',
                      help='output filename [stdout]')
    parser.add_option('-n', '--node-thres', metavar='PERCENTAGE',
                      type="float", dest="node_thres", default=0.5,
                      help='eliminate nodes below this threshold [default: %default]')

gperftools (C++):

#include <gperftools/profiler.h>

int main() {
    ProfilerStart("my_profile.prof");
    // ... your code here ...
    ProfilerStop();
    return 0;
}

The code comparison shows that gprof2dot is a Python-based tool for generating call graphs, while gperftools is a C++ library that requires integration into the source code for profiling. gperftools offers more fine-grained control over profiling but requires more setup, whereas gprof2dot is easier to use as a standalone tool for visualizing profiling data.

pyflame

2,985

🔥 Pyflame: A Ptracing Profiler For Python. This project is deprecated and not maintained.

Pros of pyflame

Low-overhead profiling with minimal impact on application performance
Supports profiling of running processes without code modification
Generates flame graphs for easy visualization of performance bottlenecks

Cons of pyflame

Limited to Linux systems only
Requires root access or specific capabilities to run
Less flexible in terms of output formats compared to gprof2dot

Code comparison

gprof2dot:

def main():
    parser = optparse.OptionParser(
        usage="\n\t%prog [options] [file] ...",
        version="%%prog %s" % __version__)
    ...

pyflame:

int main(int argc, char **argv) {
  FLAGS_logtostderr = 1;
  google::InitGoogleLogging(argv[0]);
  ...

gprof2dot is written in Python and focuses on converting profiling data to DOT format, while pyflame is written in C++ and specializes in low-overhead profiling for Python applications on Linux. gprof2dot offers more flexibility in input and output formats, whereas pyflame provides deeper system-level profiling capabilities but with platform limitations.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

About gprof2dot

This is a Python script to convert the output from many profilers into a dot graph.

It can:

read output from:
- Linux perf
- Valgrind's callgrind tool
- OProfile
- Sysprof
- Xperf
- VTune
- Very Sleepy
- Python profilers
- Java's HPROF
- prof, gprof
- DTrace
- stackcollapse from FlameGraph
prune nodes and edges below a certain threshold;
use an heuristic to propagate time inside mutually recursive functions;
use color efficiently to draw attention to hot-spots;
work on any platform where Python and Graphviz is available, i.e, virtually anywhere;
compare two graphs with almost identical structures for the analysis of performance metrics such as time or function calls.

If you want an interactive viewer for the graphs generated by gprof2dot, check xdot.py.

Status

gprof2dot currently fulfills my needs, and I have little or no time for its maintenance. So I'm afraid that any requested features are unlikely to be implemented, and I might be slow processing issue reports or pull requests.

Example

This is the result from the example data in the Linux Gazette article with the default settings:

Sample

Requirements

Python: known to work with version >=3.8; it will most likely not work with earlier releases.
Graphviz: tested with version 2.26.3, but should work fine with other versions.

Windows users

Download and install Python for Windows
Download and install Graphviz for Windows

Linux users

On Debian/Ubuntu run:

apt-get install python3 graphviz

On RedHat/Fedora run

yum install python3 graphviz

Download

PyPI
```
pip install gprof2dot
```
Standalone script
Git repository

Documentation

Usage

Usage: 
	gprof2dot.py [options] [file] ...

Options:
  -h, --help            show this help message and exit
  -o FILE, --output=FILE
                        output filename [stdout]
  -n PERCENTAGE, --node-thres=PERCENTAGE
                        eliminate nodes below this threshold [default: 0.5]
  -e PERCENTAGE, --edge-thres=PERCENTAGE
                        eliminate edges below this threshold [default: 0.1]
  -f FORMAT, --format=FORMAT
                        profile format: axe, callgrind, collapse, dtrace,
                        hprof, json, oprofile, perf, prof, pstats, sleepy,
                        sysprof or xperf [default: prof]
  --total=TOTALMETHOD   preferred method of calculating total time: callratios
                        or callstacks (currently affects only perf format)
                        [default: callratios]
  -c THEME, --colormap=THEME
                        color map: bw, color, gray, pink or print [default:
                        color]
  -s, --strip           strip function parameters, template parameters, and
                        const modifiers from demangled C++ function names
  --color-nodes-by-selftime
                        color nodes by self time, rather than by total time
                        (sum of self and descendants)
  -w, --wrap            wrap function names
  --show-samples        show function samples
  --node-label=MEASURE  measurements to on show the node (can be specified
                        multiple times): self-time, self-time-percentage,
                        total-time or total-time-percentage [default: total-
                        time-percentage, self-time-percentage]
  --list-functions=LIST_FUNCTIONS
                        list functions available for selection in -z or -l,
                        requires selector argument ( use '+' to select all).
                        Recall that the selector argument is used with
                        Unix/Bash globbing/pattern matching, and that entries
                        are formatted '<pkg>:<linenum>:<function>'. When
                        argument starts with '%', a dump of all available
                        information is performed for selected entries,  after
                        removal of leading '%'.
  -z ROOT, --root=ROOT  prune call graph to show only descendants of specified
                        root function
  -l LEAF, --leaf=LEAF  prune call graph to show only ancestors of specified
                        leaf function
  --depth=DEPTH         prune call graph to show only descendants or ancestors
                        until specified depth
  --skew=THEME_SKEW     skew the colorization curve.  Values < 1.0 give more
                        variety to lower percentages.  Values > 1.0 give less
                        variety to lower percentages
  -p FILTER_PATHS, --path=FILTER_PATHS
                       Filter all modules not in a specified path
  --compare             Compare two graphs with almost identical structure. With this
                        option two files should be provided.gprof2dot.py
                        [options] --compare [file1] [file2] ...
  --compare-tolerance=TOLERANCE
                        Tolerance threshold for node difference
                        (default=0.001%).If the difference is below this value
                        the nodes are considered identical.
  --compare-only-slower
                        Display comparison only for function which are slower
                        in second graph.
  --compare-only-faster
                        Display comparison only for function which are faster
                        in second graph.
  --compare-color-by-difference
                        Color nodes based on the value of the difference.
                        Nodes with the largest differences represent the hot
                        spots.

Examples

Linux perf

perf record -g -- /path/to/your/executable
perf script | c++filt | gprof2dot.py -f perf | dot -Tpng -o output.png

oprofile

opcontrol --callgraph=16
opcontrol --start
/path/to/your/executable arg1 arg2
opcontrol --stop
opcontrol --dump
opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

xperf

If you're not familiar with xperf then read this excellent article first. Then do:

Start xperf as
```
xperf -on Latency -stackwalk profile
```
Run your application.
Save the data. ` xperf -d output.etl
Start the visualizer:
```
xperf output.etl
```
In Trace menu, select Load Symbols. Configure Symbol Paths if necessary.
Select an area of interest on the CPU sampling graph, right-click, and select Summary Table.
In the Columns menu, make sure the Stack column is enabled and visible.
Right click on a row, choose Export Full Table, and save to output.csv.

Then invoke gprof2dot as

gprof2dot.py -f xperf output.csv | dot -Tpng -o output.png

VTune Amplifier XE

Collect profile data as (also can be done from GUI):

amplxe-cl -collect hotspots -result-dir output -- your-app

Visualize profile data as:

amplxe-cl -report gprof-cc -result-dir output -format text -report-output output.txt
gprof2dot.py -f axe output.txt | dot -Tpng -o output.png

gprof

/path/to/your/executable arg1 arg2
gprof path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png

python profile

python -m profile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

python cProfile (formerly known as lsprof)

python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

Java HPROF

java -agentlib:hprof=cpu=samples ...
gprof2dot.py -f hprof java.hprof.txt | dot -Tpng -o output.png

See Russell Power's blog post for details.

DTrace

dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
gprof2dot.py -f dtrace out.user_stacks | dot -Tpng -o output.png

# Notice: sometimes, the dtrace outputs format may be latin-1, and gprof2dot will fail to parse it.
# To solve this problem, you should use iconv to convert to UTF-8 explicitly.
# TODO: add an encoding flag to tell gprof2dot how to decode the profile file.
iconv -f ISO-8859-1 -t UTF-8 out.user_stacks | gprof2dot.py -f dtrace

stackcollapse

Brendan Gregg's FlameGraph tool takes as its input a text file containing one line per sample. This format can be generated from various other inputs using the stackcollapse scripts in the FlameGraph repository. It can also be generated by tools such as py-spy.

Example usage:

Perf

perf record -g -- /path/to/your/executable
perf script | FlameGraph/stackcollapse-perf.pl > out.collapse
gprof2dot.py -f collapse out.collapse | dot -Tpng -o output.png

Py-spy

py-spy record -p <pidfile> -f raw -o out.collapse
gprof2dot.py -f collapse out.collapse | dot -Tpng -o output.png

Compare Example

This image illustrates an example usage of the --compare and --compare-color-by-difference options.

Compare

Arrow pointing to the right indicate node where the function performed faster in the profile provided as the second one (second profile), while arrow pointing to the left indicate node where the function was faster in the profile provided as the first one (first profile).

Node

+-----------------------------+
|        function name          \
| total time %  -/+ total_diff   \
| ( self time % ) -/+ self_diff  /
| total calls1 / total calls2   /
+-----------------------------+

Where

total time % and self time % come from the first profile
diff is calculated as the absolute value of time in the first profile - time in the second profile.

Note The compare option has been tested for pstats, axe and callgrind profiles.

Output

A node in the output graph represents a function and has the following layout:

+------------------------------+
|        function name         |
| total time % ( self time % ) |
|         total calls          |
+------------------------------+

where:

total time % is the percentage of the running time spent in this function and all its children;
self time % is the percentage of the running time spent in this function alone;
total calls is the total number of times this function was called (including recursive calls).

An edge represents the calls between two functions and has the following layout:

           total time %
              calls
parent --------------------> children

Where:

total time % is the percentage of the running time transferred from the children to this parent (if available);
calls is the number of calls the parent function called the children.

Note that in recursive cycles, the total time % in the node is the same for the whole functions in the cycle, and there is no total time % figure in the edges inside the cycle, since such figure would make no sense.

The color of the nodes and edges varies according to the total time % value. In the default temperature-like color-map, functions where most time is spent (hot-spots) are marked as saturated red, and functions where little time is spent are marked as dark blue. Note that functions where negligible or no time is spent do not appear in the graph by default.

Listing functions

The flag --list-functions permits listing the function entries found in the gprof input. This is intended as a tool to prepare for utilisations with the --leaf (-l) or --root (-z) flags.

prof2dot.py -f pstats /tmp/myLog.profile  --list-functions "test_segments:*:*" 
  
test_segments:5:<module>,
test_segments:206:TestSegments,
test_segments:46:<lambda>

The selector argument is used with Unix/Bash globbing/pattern matching, in the same fashion as performed by the -l and -z flags.
Entries are formatted '<pkg>:<linenum>:<function>'.
When selector argument starts with '%', a dump of all available information is performed for selected entries, after removal of selector's leading '%'. If selector is "+" or "*", the full list of functions is printed.

Frequently Asked Questions

How can I generate a complete call graph?

By default gprof2dot.py generates a partial call graph, excluding nodes and edges with little or no impact in the total computation time. If you want the full call graph then set a zero threshold for nodes and edges via the -n / --node-thres and -e / --edge-thres options, as:

gprof2dot.py -n0 -e0

The node labels are too wide. How can I narrow them?

The node labels can get very wide when profiling C++ code, due to inclusion of scope, function arguments, and template arguments in demangled C++ function names.

If you do not need function and template arguments information, then pass the -s / --strip option to strip them.

If you want to keep all that information, or if the labels are still too wide, then you can pass the -w / --wrap, to wrap the labels. Note that because dot does not wrap labels automatically the label margins will not be perfectly aligned.

Why there is no output, or it is all in the same color?

Likely, the total execution time is too short, so there is not enough precision in the profile to determine where time is being spent.

You can still force displaying the whole graph by setting a zero threshold for nodes and edges via the -n / --node-thres and -e / --edge-thres options, as:

gprof2dot.py -n0 -e0

But to get meaningful results you will need to find a way to run the program for a longer time period (aggregate results from multiple runs).

Why don't the percentages add up?

You likely have an execution time too short, causing the round-off errors to be large.

See question above for ways to increase execution time.

Which options should I pass to gcc when compiling for profiling?

Options which are essential to produce suitable results are:

-g : produce debugging information
-fno-omit-frame-pointer : use the frame pointer (frame pointer usage is disabled by default in some architectures like x86_64 and for some optimization levels; it is impossible to walk the call stack without it)

If you're using gprof you will also need -pg option, but nowadays you can get much better results with other profiling tools, most of which require no special code instrumentation when compiling.

You want the code you are profiling to be as close as possible as the code that you will be releasing. So you should include all options that you use in your release code, typically:

-O2 : optimizations that do not involve a space-speed tradeoff
-DNDEBUG : disable debugging code in the standard library (such as the assert macro)

However many of the optimizations performed by gcc interfere with the accuracy/granularity of the profiling results. You should pass these options to disable those particular optimizations:

-fno-inline-functions : do not inline functions into their parents (otherwise the time spent on these functions will be attributed to the caller)
-fno-inline-functions-called-once : similar to above
-fno-optimize-sibling-calls : do not optimize sibling and tail recursive calls (otherwise tail calls may be attributed to the parent function)

If the granularity is still too low, you may pass these options to achieve finer granularity:

-fno-default-inline : do not make member functions inline by default merely because they are defined inside the class scope
-fno-inline : do not pay attention to the inline keyword Note however that with these last options the timings of functions called many times will be distorted due to the function call overhead. This is particularly true for typical C++ code which expects that these optimizations to be done for decent performance.

See the full list of gcc optimization options for more information.

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of FlameGraph

Cons of FlameGraph

Code Comparison

Pros of pprof

Cons of pprof

Code Comparison

Summary

Pros of coz

Cons of coz

Code comparison

Pros of gperftools

Cons of gperftools

Code comparison

Pros of pyflame

Cons of pyflame

Code comparison

Convert designs to code with AI

README

About gprof2dot

Status

Example

Requirements

Windows users

Linux users

Download

Documentation

Usage

Examples

Linux perf

oprofile

xperf

VTune Amplifier XE

gprof

python profile

python cProfile (formerly known as lsprof)

Java HPROF

DTrace

stackcollapse

Compare Example

Node

Output

Listing functions

Frequently Asked Questions

How can I generate a complete call graph?

The node labels are too wide. How can I narrow them?

Why there is no output, or it is all in the same color?

Why don't the percentages add up?

Which options should I pass to gcc when compiling for profiling?

Links

Top Related Projects

Convert designs to code with AI