Current state

What was the goal

Currently qapitrace communicates with glretrace through glretrace CLI. The goal is to implement new performance counters while preserving this behavior. 

1) Write metric abstraction system

2) Write backends (AMD_perfmon, …)

2) Make changes in glretrace to leverage this new abstraction. There should be possibility for profiling frames or calls.

(git tree:

Abstraction system

The proposed abstract class responsible for this system is here:

It should be noted that backend is keeping and managing profiling data in this design.

For each backend corresponding realization is to be written.

This abstraction is presumably general enough for at least AMD_perfmon, INTEL_perfquery, NV PerfKit, perf.

Currently only AMD_perfmon backend is written, so there could perhaps be some need for changes in the abstraction design. However, I would not expect anything significant.




Implementation is pretty straightforward, but there are several things to note.

Firstly, not all the counters can be sampled at the same time. Therefore profiling is done in several passes. For each pass a set of compatible metrics is sampled. These passes are generated in Api_GL_AMD_performance_monitor::generatePasses. For testing metric compatibility the following procedure is used: Api_GL_AMD_performance_monitor::testCounters. It is a linear time algorithm and it works pretty fast, but for large sets of selected metrics a big number of passes is generated (for example, for 10000 counters 2500 passes are generated) of which some are odd. So this can be optimized later by using other algorithms that’ll exclude extra passes.

Secondly, data can be obtained asynchronously in AMD_perfmon extension. That means that you can continue to use CPU and not wait until profiling data arrives. For this purpose multiple AMD_perfmon monitors should be used and cycled one after another. In my backend the number of used monitor is defined in the header:

#define NUM_MONITORS 1

Increasing the number improves performance to some extent (nothing crazy). This can be further tested later. If results are good and this feature is to be used, one should determine if requested number of monitors is actually supported by hw (not sure how to do this atm).


Common backend:

This is the helper backend to obtain time boundaries for calls/frames.

Original routines from glretrace are not used for several reasons:

1) They work only for call profiling (not frame profiling)

2) They look not very natural in context of abstraction system

3) They get the data synchronously (however, current implementation in this backend does the same 🙂 but there is a possibility to tune it)


Here I’ll comment on what changes I’ve made:

1) retrace_main.cpp


Added CLI option for testing abstraction system:

glretrace --pamd=false    #For profiling calls
glretrace --pamd=true      #For profiling frames


Added a loop (responsible for passes) around main loop:



 for (int j = 0; j < retrace::getNumPasses(); j++) {
     for (i = optind; i < argc; ++i) {
         retrace::curPass = j;
         if (![i])) {
             return 1;





2) glretrace_main.cpp


Originally profiling was done in the following methods:

void beginProfile(trace::Call &call, bool isDraw); // called before every GL call
void endProfile(trace::Call &call, bool isDraw); // called after -//-
void initContext(); // called first time context is made current
void frame_complete(trace::Call &call); // after frame end
void retrace::finishRendering(void);

New profiling is also done by inserting certain lines of code into these methods.

Output is generated by the helper class:

The output is in tabular format ( Examples: call profilingframe profiling; google for some reason displays tabs as boxes). It takes reasonable space for not very long traces.

Current problems

It is not clear what to do with multicontext traces. It is certainly not consistent in frame profiling. Some metric providers (like AMD_perfmon) do not explicitly state whether metrics are context dependent or global. Even the set of available metrics can vary depending on the currently active context. My current implementation just hangs in the case new context is made current, perhaps there should be found some workaround 🙂


What is still needed to be done

1) All other than AMD_perfmon backends

2) Metric selection interface

Currently ALL available metrics are sampled. There should be some way to plug metrics you are interested in to glretrace. It could be a file or a string you pass as an argument to glretrace. What format?

3) Handling many (> 1) backends

When you have many different backends in your disposal, you need to handle them somehow. There should be at least a way to tell if they can be used simultaneously to obtain metrics. Something like a collector for backends. It may also simplify the code in glretrace_main.cpp by providing more general interface (metric selection interface could be a part of it).





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s