API breakdown: AMD_performance_monitor, INTEL_performance_query, PerfKit

  1. Initialization

    1. AMD_performance_monitor
      void GenPerfMonitorsAMD(sizei n, uint *monitors)

      Create monitoring sessions. At creation time, the performance monitor object has all counters disabled.

    2. INTEL_performance_query
    3. PerfApi
      NVPMRESULT CreateContextFromOGLContext()

      Create API context.

    4. Difference

      Not essential.

  2. Query counters

    1. AMD_performance_monitor

      Counters are distributed in groups. There are 2 functions for querying groups and counters within the group:

      void GetPerfMonitorGroupsAMD(int *numGroups, sizei groupsSize, uint *groups);
      
      
      void GetPerfMonitorCountersAMD(uint group, int *numCounters, int *maxActiveCounters, sizei countersSize, uint *counters);

      Note that there is the max number of counters that can be active at any time in .

    2. INTEL_performance_query

      A set of performance counters is represented by a unique query type (set of performance counters). Traversing queries list:

      void GetFirstPerfQueryIdINTEL(uint *queryId);
      
      void GetNextPerfQueryIdINTEL(uint queryId, uint *nextQueryId);

      Perf. counters in certain query are obtained like this:

      void GetPerfCounterInfoINTEL(uint queryId, uint counterId, 
       uint counterNameLength, char *counterName, 
       uint counterDescLength, char *counterDesc, 
       uint *counterOffset, uint *counterDataSize, uint *counterTypeEnum,
       uint *counterDataTypeEnum, uint64 *rawCounterMaxValue);
    3. PerfApi

      Enumerate counters:

      NVPMRESULT NVPMEnumCounters(NVPMEnumFunc pEnumFunction);
      int (*NVPMEnumFunc)(NVPMCounterID unCounterID, const char *pcCounterName);

      There are at least 3 types of counters: software (OpenGL), GPU and so-called simplified experiments.

      Counter description is obtained with several functions:

      NVPMRESULT NVPMGetCounterDescription(NVPMCounterID unCounterID, char *pcString, NVPMUINT *punLen);
      NVPMRESULT NVPMGetCounterAttribute(NVPMCounterID unCounterID, NVPMATTRIBUTE nvAttribute, NVPMUINT64 *punValue)


    4. Difference

      All available counters could be queried and grouped. AMD_perfmon has explicit grouping, INTEL_perfquery uses several queries (groups). There are several ways to group PerfKit counters if needed:
      a) By type {GPU, OGL, D3D, SIMEXP, USER, or AGGREGATE}
      b) By domain (PM unit responsible for collecting the counter)
      Probably the best way to group PerfKit counters is to use direct product of both.

  3. Counters selection

    1. AMD_performance_monitor
      void SelectPerfMonitorCountersAMD(uint monitor, boolean enable, 
       uint group, int numCounters, 
       uint *counterList);
    2. INTEL_performance_query

      You need to create multiple instances of:

      void CreatePerfQueryINTEL(uint queryId, uint *queryHandle);

      All counters within the intel query are to be sampled then.

    3. PerfApi

      You add counters to the created context:

      NVPMRESULT NVPMAddCounters(NVPMContext perfCtx, NVPMUINT unCount, NVPMCounterID *punCounterIDs);
    4. Difference

      Nothing to note.
      Anyway, you should have some internal structures and routines that allow selecting counters in the first place. Afterwards you determine which are compatible and use APIs listed above to create actual performance contexts.

  4. Making perfomance session

    1. AMD_performance_monitor

      Multiple passes should be made manually. Used apis:

      void BeginPerfMonitorAMD(uint monitor);
      void EndPerfMonitorAMD(uint monitor);
    2. INTEL_performance_query

      Multiple query objects can be nested via apis:

      void BeginPerfQueryINTEL(uint queryHandle);
      void EndPerfQueryINTEL(uint queryHandle);

      But not always, for such cases multiple passes should be made.

    3. PerfApi

      There are 2 modes here:
      a) Real-Time mode
      Collects per-frame metrics in realtime, however can only sample a certain number of counters per frame. Therefore not considered.
      b) Experiment mode
      Starts with:

      NVPMRESULT NVPMBeginExperiment(NVPMContext perfCtx, NVPMUINT *pnNumPasses);

      Returns needed number of passes in pnNumPasses.
      Passes are done via calls:

      NVPMRESULT NVPMBeginPass(NVPMContext perfCtx, NVPMUINT nPass);
      NVPMRESULT NVPMEndPass(NVPMContext perfCtx, NVPMUINT nPass);

      Each draw call is sampled with following constructions:

      NVPMRESULT NVPMBeginObject(NVPMContext perfCtx, NVPMUINT nObjectID);
      NVPMRESULT NVPMEndObject(NVPMContext perfCtx, NVPMUINT nObjectID);

      Experiment is ended with:

      NVPMRESULT NVPMEndExperiment(NVPMContext perfCtx);
    4. Difference

      In all apis performance sessions are done via Begin/End constructions. PerfKit automatically takes care of incompatible metrics and generates needed number of passes. In AMD/Intel this should be made manually.
      In AMD/Intel cases metrics are collected asynchronously, PerfKit requires CPU/GPU synchronization (i.e. glFinish() after each draw call).
      Also, you can embed any amount of draw calls in Begin/End block for AMD/Intel (so that you can profile frames, for example), whereas for PerfKit you can profile only individual draw calls.

  5. Metrics compatibility (# of passes)

    1. AMD_performance_monitor

      In case the sample with current selection of counters cannot be made, BeginPerfMonitorAMD returns INVALID_OPERATION error. This can be used in tests to determine needed number of passes and to generate profile contexts for these passes.

    2. INTEL_performance_query

      As already noted multiple BeginPerfQueryINTEL cannot always be nested. In such cases it also returns INVALID_OPERATION. Actually, for each intel query a separate pass could be made, so there’s no need to test anything.

    3. PerfKit

      Fortunately, everything is done by Api.

  6. Data collection

    1. AMD_performance_monitor

      Assuming session is closed, data is obtained by:

      void GetPerfMonitorCounterDataAMD(uint monitor, enum pname, sizei dataSize, uint *data, sizei *bytesWritten);
    2. INTEL_performance_query

      All queries must be ended:

      void EndPerfQueryINTEL(uint queryHandle);

      Data is obtained via:

      void GetPerfQueryDataINTEL(uint queryHandle, uint flags, sizei dataSize, void *data, uint *bytesWritten);
    3. PerfKit

      Counter values are determined (after all passes are done) through:

      NVPMRESULT NVPMGetCounterValue(NVPMContext perfCtx, NVPMCounterID unCounterID, NVPMUINT nObjectID, NVPMUINT64 *pulValue, NVPMUINT64 *pulCycles);
    4. Difference

      AMD/Intel metrics become available after graphics hardware completes processing of the hardware commands. So you can create multiple performance monitors/queries (up to their max amount) and use them for profiling units (draw calls, frames) without waiting for the metrics data. After you’ve used max amount of monitors you can free some of them by querying data, so that you can reuse them.
      PerfKit requires explicit CPU/GPU synchronization. Data should be available just after the draw call & glFinish combo. But for some reason data can be accessed only after the whole experiment is over (even for metrics that require only single pass).

  7. Data

    1. AMD_performance_monitor

      Types:

      UNSIGNED_INT
      UNSIGNED_INT64_AMD
      PERCENTAGE_AMD // (float in range [0.0 .. 100.0])
      FLОAT
    2. INTEL_performance_query

      Types:

      PERFQUERY_COUNTER_DATA_UINT32_INTEL
      PERFQUERY_COUNTER_DATA_UINT64_INTEL
      PERFQUERY_COUNTER_DATA_FLOAT_INTEL
      PERFQUERY_COUNTER_DATA_DOUBLE_INTEL
      PERFQUERY_COUNTER_DATA_BOOL32_INTEL

      There are also types that specify what data represents:

      PERFQUERY_COUNTER_EVENT_INTEL
      PERFQUERY_COUNTER_DURATION_NORM_INTEL
      PERFQUERY_COUNTER_DURATION_RAW_INTEL
      PERFQUERY_COUNTER_THROUGHPUT_INTEL
      PERFQUERY_COUNTER_RAW_INTEL
      PERFQUERY_COUNTER_TIMESTAMP_INTEL
    3. PerfKit

      For each counter 2 vars are given: value and cycles (both UINT64 or FLOAT64).
      For some counters only value has some meaning, for others value/cycles.

    4. Notes

      Each metric has some numerical representation. However, some of them can represent such things as timestamps, counter ids, etc.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s