• Implements a multi-core, multi-threaded graphics processing unit (GPU) with extremely high performance and ultra-low power consumption aimed at both graphics rendering/acceleration and general-purpose computing in embedded applications.


    The configurable and scalable NEMA GPU uses an innovative architecture consisting of one or many processing clusters interconnected with a proprietary Network On Chip (NoC). Each cluster can have one to four floating point vector processing cores, and each core is able to run up to 128 threads. The resulting performance is extremely competitive, providing, for example 19.2 GFlops at just 533 MHz with one four-core cluster.

    NEMA combines this processing power with ultra-low power consumption. Proprietary compression techniques minimize the bandwidth to the frame buffer (access to which is the major power consumer of any GPU) and intelligent Dynamic Voltage Frequency Scaling (DVFS) allows adjusting the power consumption to suit the computation load. Optional custom hardware accelerators for typical graphic processing tasks such as Texture Mapping, Pixel Blending, and Polygon Rasterization further reduce power consumption.

    The NEMA GPU is easy to program using an included compiler tool chain and supporting popular graphics APIs and operating systems Android & Linux.

    • Multicore Architecture
    • Unified Shader Architecture
    • C/C++/OpenCL LLVM Compiler
    • Ultra threaded Processor
    • GPGPU Compute
  • Image processing demos running on a 4 core FPGA implementation of Nema GPGPU. NEMA 4 core GPGPU is running @76MHz on a Xilinx Zynq 706 platform and ThinkLCDML drives an 800x600 resolution display.