Power-Performance Profiling Tool
NEMA®|Profiler is a cross-platform profiling tool, enabling software developers to optimize their OpenGL® ES, EGL™. Vulkan® or NEMA®|GFX code without having an in-depth understanding of the NEMA®|GPU architecture. NEMA®|Profiler enables developers to take advantage of the full spectrum of capabilities of the NEMA®|GPU-Family, to create appealing graphics-applications and achieve optimal power - performance balance for ultra-low power display devices. With this useful tool, the overall development process will be accelerated and the quality of the software and its maximum performance will improve.
Easy to use – easy to see
The NEMA®|Profiler toolchain offers a graphical development environment that illustrates the power and performance bottlenecks of the executing application. The tool collects and visualizes GPU, API, and OS counter data by highlighting regions of interest (hotspots). With a simple “double-click” on the graph of a “performance/power spike”, the corresponding code snippet automatically appears, enabling the developer on the spot, to analyze and review it.
Visualize the problematic zones
NEMA®|Profiler displays CPU and GPU activities on a timeline, and includes an automated analysis engine to identify areas where optimization might be necessary, whether there are time-consuming parts of a graphics application or particular frames using the most time-consuming shaders. NEMA®|Profiler comes with a rich set of visualization capabilities (e.g. stacked bars, pie charts, line charts) enabling the visualization of the gathered information in different levels of detail, and the “call-graph” analysis feature enables optimization in cross-API systems (e.g., when an OpenGL application is executed on top of a Vulkan driver).
Less is more: collect – store – analyze
The heart of the tool is the data-collection function. In order to minimize the overhead created from the CPU and the GPU, smart and adaptive sampling techniques are used in order to gather as less information as possible from the hardware but still getting a comprehensive picture of the operations.
There is no need to collect GPU data from every hardware counter cycle, which is not efficient to begin with, since every interrupt slows the system down, which also has a negative impact on the executed application. Better and more efficient is to adapt the sample ratio (number of cycles between two consecutive reads of the counters) to the application. With the adaptive sampling technique, data will be collected about every 10ms, which corresponds to millions GPU cycles, and a curve fitting algorithm generates the data of the remaining periods between the samples taken. In this way, the application can be accurately traced without introducing noise in the system.
That way, less data will be collected and the sampling frequency becomes decreased when the executing application shows a stable behavior, while when the application shows a more dynamic behavior, the sampling frequency will automatically go up, to collect and store all “interesting/important” events to analyze. Therefore just a minimal overhead is added to the CPU (less than 1%) and zero overhead to the GPU.
NEMA®|Profiler is able to give you access to about 150 low-level performance hardware counters inside the GPU, which can be used to accurately determine how the application is using the computation and memory resources of the GPU, and identify performance issues, bottlenecks, and confirm when the problem has been resolved.
The current release is “API aware”, and is able to retrieve and present events that have been generated by the client driver or the driver of native programming interfaces, such as OpenGL ES, EGL, Vulkan and NEMA®|GFX. Performance analysis can be executed in real-time and offline.
The NEMA®|Profiler toolchain builds on technology co-developed within the LPGPU2 project (EU H2020 project funded under grant agreement No 688759) and the CodeXL infrastructure (released recently as an open-source project by AMD®). The consortium of the project consists by EU-based graphics experts from TU Berlin, Samsung Research UK®, Codeplay®, Think Silicon®, and Spin Digital®.
For further information please contact us at info(at)think-silicon.com.
NEMA®|Profiler is running on a host (PC) connected to the device, where the application is executed, and collects the data. NEMA®|Profiler can also collect data from applications running on the device with limited memory, since the collected data from the device is split into small data-portions, and then transferred in real-time to the host (PC), written to a database file, and deleted from the device. The database file is the single point of reference and can be accessed in also real-time or later (off-line analysis).
A very useful feature to analyze these results is the “Code comparison Feature”, which allows to compare multiple code-database files against each other pointing to the source code that led to the performance improvement. The database-file can be shared in between developers and is extremely useful by supporting development-teams from customer or 3rd party developers in order to achieve the utmost best results.
Currently, NEMA®|Profiler supports Android and Linux devices and is able to debug and profile OpenGL ES, EGL, Vulkan and NEMA®|GFX applications (additional OS e.g. RTOS and general-purpose computing libraries are planned for the near future).
⇒ Focus on the information that matters
Quickly identify potential performance bottleneck issues in your applications using highly configurable tables and graphical views.
⇒ Automated performance analysis
Perform automated analysis of your application to identify performance bottlenecks and get optimization suggestions that can be used to improve performance.
⇒ Software fallbacks
Visualize all software fallbacks between the GPU and MCU/MPU. The OpenGL functionalities transferred from the GPU to the MCU/MPU for execution can be easily identified from the developer, who is now able to further optimize the code towards balancing load between the GPU and CPU.
⇒ Guided Application Analysis
The tool provides step-by-step analysis and optimization guidance. The analysis results now include graphical visualizations to more clearly indicate the optimization opportunities.
⇒ Hardware agnostic based on a clear interface
It does not require any modifications to run on a variety of devices, bringing about a high level of compatibility across most common systems.
⇒ Unified Timeline
View the activity of an application occurring on both CPU and GPU in a unified time line, including OpenGL API calls, Vulkan calls and NEMA®|GFX calls.
⇒ NEMA®|GFX-API Trace
View all memory transfers, kernel launches, and other API functions.
⇒ Drill Down to Raw Data
Gain low-level insights by looking at performance metrics collected directly from GPU hardware counters.
⇒ Code comparison - compare efficiency of program code across multiple sessions
Confirm performance improvements by comparing multiple code-database files against each other, and pointing to the source code that led to the performance improvement.
⇒ Full documentation of the tool suite and all signals
The tool is accompanied with full documentation enabling the user to easily understand all signals provided so as to intervene appropriately and achieve considerable code improvements.
⇒ Analyze data collected from remote systems
Offline processing of the data collected from remote systems is possible enabling developers to have a complete view of both the system and the application.
⇒ Support for NEMA®|GPU-Series
It enables developers to take advantage of the full spectrum of capabilities of the NEMA®|GPU-Family.
⇒ Power profiling based on power estimation models
Observe how the GPU power values vary during application execution. This feature enables developers to identify and optimize the most energy inefficient parts of the executed applications, estimating power consumption based on data captured by the NEMA® Performance Monitoring Unit (PMU). The tool estimates with high accuracy the energy consumption per component helping developers to make it “greener”, without sacrificing graphics and image processing performance.