GSoC ’16 final

This year’s GSoC has ended.

My project was to develop new OpenGL profiling view for QApitrace to accompany profiling backends and custom metrics support for Apitrace added last year.

Unfortunately, I was not able to completely finish the project. Some features are still missing, and I haven’t received much feedback on the whole thing yet. Nevertheless, profiling GUI is pretty much usable at the moment and can be found under the links below:

Link to frozen repository (@ GSoC end):  https://github.com/trtt/apitrace/commits/guigsoc

Link to working repository (to be updated): https://github.com/trtt/apitrace/commits/guiwip2

(Note: compilation is the same as for Apitrace with GUI; new profiling view can be found in the menu (Trace -> Profile with backends))

I’ll continue the work on the project and try merging as soon as it’s feature complete and there is enough feedback.

What is done:

  1. New metrics support (new ones are most likely to be found on Mesa for Intel gen 5-8, Nouveau, also proprietary AMD drivers can features some) via metric selection interface
  2. Drawcalls profiling view: graph view (timelines, histograms for metrics) with OpenGL acceleration; table raw metric data view (with sorting/grouping)
  3. Basic UI interactions (range selection, zooming, scrolling etc.)

What is left to do:

  1. Frame metrics view
  2. Statistics pane
  3. Interactions between graph view and table view
  4. Quality of life UI additions
  5. Some performance tuning

 

GSoC ’16: mid-term

In 2016 I am again taking part in GSoC with the continuation of the project. Last year I was adding profiling backends and custom metrics support for Apitrace (so the CLI side), whereas this year I’m in for developing new profiling GUI for QApitrace.

At the time of writing it is the week of mid-term evaluations of GSoC project. In this post I am describing the work so far.

  • Some test coverage

The first item is not actually related to the GUI. Last year addition to Apitrace in the form of profiling backends was quite a large amount of code, moreover it relied on the multi-pass (running retrace several times successively) feature, which could possibly be problematic due to invalid assumptions done in globals within glretrace part of Apitrace. Hence, some sort of the feature test coverage was needed.

Profiling results (also available set of metrics) are highly dependent on the hardware used. So it was only possible to come up with some bare minimum amount of test coverage. https://github.com/trtt/apitrace-tests

Strictly speaking, it is utilizing only one metric (‘drawn pixels’ as it appears in different backends) to test it consistency between different passes and such.

In process it was found that globals in glretrace indeed cause some problems with multi-pass. Actually, what was found is a very minor issue — different behavior of the first call in different passes due to the same visuals (simply put, a window) used in retracing in not-resetting global variable. I made some attempts (to be honest, not so greatly successful) to fix this issue, but apparently it’s still not very clear what to do with it. Currently I’m trying to discuss this situation on the mailing list with Apitrace author José Fonseca.

  • GUI

Meanwhile, I started the actual work on the GUI part. The first thing to address this year — means of data visualization.

It was the most problematic place last year when I tried to begin the work on the GUI. There were no readily available solutions for drawing plots that I could make use of. Ideally I would need something with QML support, histograms (bar charts), timelines (gantt charts), some specific user interaction types support and, finally, something that works fast enough. This year there were some pretty good candidates, namely, QtCharts (becoming open-source with Qt 5.7 release) and timeline lib in Qt-Creator. I spent some time examining them. Unfortunately, QtCharts, for example, has some sort of GL-acceleration support only for simple line plots (and is very slow otherwise), and there are many other restrictions with these solutions. More importantly, they all are licensed under GPLv3, this fact alone makes it impossible to use them in MIT-licensed Apitrace (I wish I realized this thing earlier).

So I’m inevitably back to designing some custom solution for charts. I went on implementing the idea suggested by my GSoC mentor Martin Peres (mupuf). It is basically an OpenGL routine that uses textures for holding data and instanced drawing for performance. Current development is happening here: https://github.com/trtt/bargraph

It worked out pretty well, it’s able to maintain at least 100 fps with large data sets. The major drawbacks are: only 2^31 events supported, pretty high RAM usage (also GPU memory). But it looks like it’ll fit all the needs of the project nevertheless. There are still many things to work on: designing proper QML element, convenient controls, axes, handling multiple charts, extending to timelines, preliminary filtering support, tooltips etc. And that is only considering charts, so there’s really a lot of work lying ahead.

 

 

 

 

 

GSoC: Summary

The project was to improve the profiling capabilities of apitrace.

  1. Abstraction system for profiling in glretrace (+ AMD_perfmon, INTEL_perfquery support)
    This part of the project is ready.
    Description can be found here: Description
    Repository: https://github.com/trtt/apitrace
    The patchset is sent upstream for approving. There are some little problems at the moment, but I hope everything will be fine.
  2. GUI: Improving profiling view in QApitrace
    Unfortunately, there was not enough time to finish the GUI.
    Some individual parts are ready.
    What is missing now are data visualizations (histograms, charts etc.)
    It would be nice to have them in QML, but it seems like there are no currently available solutions for that. It should be written from scratch. Perhaps old widgets can be used, at least for some time.
    Here is, for example, screenshot of how metric selection looks like:
    mesegui
    I am going to continue the work on the GUI (probably after it would be more clear if the previous patchset is going to be accepted and with what changes).

Current state

What was the goal

Currently qapitrace communicates with glretrace through glretrace CLI. The goal is to implement new performance counters while preserving this behavior. 

1) Write metric abstraction system

2) Write backends (AMD_perfmon, …)

2) Make changes in glretrace to leverage this new abstraction. There should be possibility for profiling frames or calls.

(git tree: https://github.com/trtt/apitrace)
Continue reading

Reviewing current solutions

As you probably know, I will be adding some performance counters and enhancing the profiling view of Apitrace this GSoC summer. Recently I was experimenting with existing similar software. Here are some of my notes. This is certainly not a detailed review, I was primarily focusing on profiling and applications to Apitrace.

I had a look at Intel GPA (as a part of Intel INDE ’15), AMD PerfStudio (3.2.18.0), NVIDIA Nsight (4.6) and CUDA Visual Profiler (7.0). Sadly enough, not all of these tools are available on Linux (on linux you have Visual Profiler for CUDA and AMD CodeXL for OpenCL, that is all I’ve found) and not all of them do support OpenGL profiling. Do not be surprised by the look of Windows UI and DirectX api names.


1. Intel GPA (DirectX): Platform Analyzer + Frame Analyzer

Currently supported Apis include DirectX and OpenGL ES (with linux toolkit available), but not OpenGL. Also I was told that Core OpenGL profile is being developed.

The toolset consists of several applications: Graphics Monitor, System Analyzer, Platform Analyzer, Frame Analyzer. Graphics Monitor is simply used to display HUD and collect data (one frame snapshot or a trace for some fixed duration) from applications by setting up some triggers or manually using hotkeys (which almost never work as analyzed applications trap all keystrokes). Also note, that there is no replay functionality (as in Apitrace), all performance data is collected per frame (there is actually a more fine-grained option) and at the same time timeline is made (when application is run).

intel_metricselection

In the options (screen above) you are limited to selecting only 4 counters (probably because more would not fit into HUD), but you can extend this number by using System Analyzer. This is the tool that connects to your current session and displays performance graphs in realtime. You can have arbitrary many of them, and all the data displayed is also included in the trace capture file when you save one.

intel_sysanalyz

Trace files can be opened in Platform Analyzer. This tool is all about timelines. Among others there are cpu/gpu frame bounds, gpu usage, DirectX tasks by thread. What is interesting here – GPU metrics are displayed as separate graphs in the timeline along other data (screens below). I think this is a nice visualization and it could be implemented in Apitrace, though I am not sure if per-call metrics can be displayed in such a way.

intel_pa1

intel_pa2

By the way, you can have a look at what metrics are supported here: https://software.intel.com/sites/products/documentation/gpa/15.1/win/Metrics_List_for_Intel_Graphics_Performance_Analyzers.htm

Also there is a filtering option. However, it didn’t work as I expected.

intel_pa3

I thought it filters out everything but the objects of the type selected. So, for example, you could filter by specific call (or call types, like draw calls), and easily find them in the timeline. This also could be in Apitrace. In reality, this option grays out everything except the selected object (and objects connected to it, neighbors in the timeline), so that some numbers in the statistics pane (bottom one) are recalculated.


Frame Analyzer offers many options, but I will focus on profiling. The tool gives the ability to analyze both individual calls (ergs in GPA terminology) and group of calls (grouped by render target). Metrics collection works differently for these set of tasks (details here https://software.intel.com/sites/products/documentation/gpa/15.1/win/Analyzing_the_Time_for_Individual_Ergs_and_the_Time_for_a_Group_of_Ergs.htm). Also the set of available metrics differs from the one used in traces (I had pretty much nothing on my Intel HD 3000). Initially I was not even thinking about Apitrace as an individual frame analyzer. I do not even know if this is actually possible to implement in the current state. Anyway, I do not think I am dealing with this in the scope of my project. This could be implemented later on the basis of what is already done. For those who are interested: recently there was proposed a project of creating frame-analysis tool based on Apitrace https://github.com/janesma/apitrace/wiki/frameretrace-branch, and it is being discussed quite actively on the mailing list.

There is however a feature that can be brought to Apitrace profiling. When you change something (like shader code, directx states etc), you can reprofile and see the difference in numbers.

intel_fa

Similar thing can be done in Apitrace (for the cases when you change call parameters in the main interface), but I do not know whether long profile times would make it usable.


2. AMD GPU PerfStudio (OpenGL)

PerfStudio is frame debugger/profiler. You connect your application to the PerfStudio server and then work with it. In terms of profiling PerfStudio allows to capture per-draw metrics after stopping the application at specific frame.

perfstudio_1

Counters are divided into groups. I do not know how one can group the counters I am going to implement for Apitrace (for example, AMD_performance_monitor counters have internal groups), but this idea should be definitely considered. As soon as you select counters profiling takes place. It is required to be done in several passes, this fact suggests that some metrics are incompatible and cannot be taken simultaneously (am I missing something here?). Finally, the table of metrics data is displayed:

perfstudio_2

Across the top of the Data tab you can group draw calls by the so-called state bucket. This means you can group the draw calls which share the same shader, render into the same render target (or depth or stencil buffer) etc. For these grouped calls the data is aggregated (for percentages is averaged). Looks like this:

perfstudio_3

Currently Apitrace groups calls by shader program if I understand it right. Implementing other state buckets (FBO, stencil, depth) might be a good idea if it is possible at all (i.e. is this information available in Apitrace and how consistent is it in context of many frames?).

In the options you can hide data from chosen counter groups:

perfstudio_4

Might be useful.

PerfStudio also has an option for comparing two profiles:

perfstudio_5


3. NVIDIA Nsight (OpenGL)

Nsight for Windows is shipped as a Visual Studio plugin. There is also a Linux version for Eclipse, but it is a part of CUDA Toolkit and supports only CUDA api. Windows version refused to run OpenGL 3.3 application, it looks like it can stand only Opengl >= 4.2

Nsight is a frame debugger/profiler (like PerfStudio). Well written description of profiling capabilities can be found here: http://docs.nvidia.com/gameworks/index.html#developertools/desktop/frame_profiler_ogl.htm

It is worth noting that Nsight also uses state buckets and PerfKit counters allow to draw some pretty charts. I think such things would be too much for Apitrace (also it is PerfKit specific), but there should be some options to export data (then you can plot whatever you like).


4. NVIDIA Visual Profiler (CUDA)

Visual Profiler is a part of CUDA Toolkit and you can run one under Linux. It is similar to Apitrace in a sense that it also replays the whole timeline for profiling and it takes forever. The workflow is simple: you generate timeline, you collect metrics, you analyze your app. I was not able to found here anything interesting to apply to Apitrace.

cuda2

cuda1

Proposal

Summary:

This project aims to improve the profiling capabilities of Apitrace on 
the OpenGL side by introducing new metric-collection systems and 
improving the profiling GUI.


Goals and details:

1. Implement an abstraction system for profiling data collectors. The 
idea behind this is to write a general interface to different sources of 
profiling data. It should be capable of at least:

   a. Initialization/deinitialization

   b. Setup (basically choosing counters)

   c. Methods like beginQuery/endQuery in OpenGL

I think this all should be added as an additional feature with the 
corresponding option to be passed to the compiler.

2. Implement on top of the abstraction system several backends:

   a. AMD_performance_monitor OpenGL extension

   b. INTEL_performance_query OpenGL extension

   c. perfkit (probably via libperf)

3. Improve existing GUI for profiling performance of traces. It should 
basically be extended to support new metric-collection systems. This 
particular task can be divided into 2:

   a. Signals (counters) selection for retrace

   b. Visualization of metrics.

This is likely to be the most difficult part.



Additional goals (if there is some spare time):

1. Additional GUI features.

2. CLI support for new metric-collection systems.

3. Windows/Mac support.



Approximate schedule:

There will be 14 weeks beginning with May 25.

Week 1-2: Study existing similar solutions (like one from NVIDIA), work 
out a plan for the GUI design and development.

Week 3-4: Write an abstraction system. Implement backends.

Week 5: Rewrite existing GUI in a very simple way to allow testing for 
implemented backends.

Week 6: Mid-term evaluations.

Week 7: Implement signal selection part of the GUI.

Week 8-11: Main GUI development (this one highly depends on the plan 
worked out in the first weeks).

Week 12-13: Test and polish.

Week 14: Final evaluations.