A GPU for ultra-low power, cost efficient and small display applications

  • NEMA|p is the smallest member of the NEMA-GPU Series, and has been specifically designed to serve the need to build economically smart SoC's to drive small yet vibrant display applications.

    Developing those area constrained applications can be challenging and a cost efficient graphic solution is a key element to succeed but without sacrificing graphics performance and dispense the ability of ultra-low power consumption. NEMA|p has an incredibly small silicon footprint of just 0.07mm2 (150MHz @ 28nm), with leakage power GPU consumption of just 0.06mW. Implementing Think Silicon's proprietary framebuffer compression technology (TSFBc) limits memory power consumption to just 0.03mW (in DDR-less systems).

    It features a programmable 2D graphics-rendering engine and has innovative composition functionalities while supporting Linux, RTOS, and Graphics APIs like uGFX and DirectFB. Bare metal library enables graphics applications development in embedded systems with no Operating System.

    Application and Markets

    NEMA|p is the perfect candidate to support entry level IoT-platforms, wearable and embedded devices with low cost and ultra-low power requirements supporting SoC's with a 32-bit MCU (e.g. ARM® Cortex®M processors) and provide fluid 2D graphics experience for a wide range of applications.

    Developers are able to create compelling 2D Graphical User Interfaces (GUIs) and software applications with ultra-long battery life at a significantly lower cost for power-memory-area constrained IoT devices.

    For example, with core frequencies just as little as 25MHz, NEMA|p delivers a fast and brilliant 2D UI experience on a VGA (640x480) screen, without being limited to these parameters.

    Performance per mWatt per Dollar

    The NEMA-GPU Series performance per silicon area per clock frequency is unrivalled in its class. NEMA|p has been designed to perform favorable against these critical performance benchmarks. As a result NEMA|p uses 87% less active and 98% less idle power and has a 4 times smaller silicon footprint (compared to the competition), leading to significant cost reduction.

    Think Silicon's proprietary 4bpp (bits-per-pixel) real-time frame-buffer compression (TSFBc), the 6bpp texture compression and real-time de-compression (TSTXc) benefits architects and finance controllers equally. The compressed images and the software libraries are so small in size that they fit into the internal SoC memory. As a result, expensive external DDR memory can be minimized or entirely eliminated. This reduces the SoC idle power consumption about impressive 98%, extends system battery life about 10 times and lowers the overall BOM cost. The combined performance and cost advantages make NEMA|p to a Performance-Power-Cost leader in the class of 2D GPU's.

    Architecture

    NEMA|p has been designed for graphics efficiency in ultra-compact silicon area and it is the smallest member of NEMA family. Its fixed point data path and instruction set architecture (ISA) are tailored to GUIs acceleration and small display applications leading to substantial improvements in power consumption and silicon area.

    NEMA|p microarchitecture is based on a lean version of NEMA ISA and it combines hardware-level support for multi-threading, VLIW and low-level vector processing in the most power efficient way. It features a smart IOMMU for easy integration while eliminating the unnecessary data traffic between host CPU and NEMA|p.

    nema_architecture

    Integration/verification

    The NEMA|p GPU IP Platform is available in Verilog and easy to integrate and verify. NEMA|p ASIC reference designs have been evaluated in various process technologies. NEMA|p is designed with AMBA interfaces (AHB, AXI 32 or 64 bits) and embeds DMA controllers and command lists for minimal CPU overhead. The core has been verified through extensive simulation and rigorous code coverage measurements. It comes together with a complete verification suite that compares reference images with rendered images.

    Deliverables and Documentation
    Deliverables include: complete set of synthesis and STA (Static Timing Analysis) scripts, OS drivers (for Linux, FreeRTOS), graphics API software libraries (for uGFX, DirectFB, Qt) and standalone bare drivers.

    Documentation includes: IP manual, integration manual, software library manual including code samples, and demonstration platform application notes.

    A reference design systems and demo-sets are available for platforms: Xilinx Zynq, Altera SoCkit.

    • Fully programmable engine with a VLIW instruction set
    • Fixed point functional units
    • Command list based DMAs to minimize CPU overhead
    • Compression schemes
      • Framebuffer: 4bpp
      • Texture: 6bpp with/out alpha
      • 2D drawing
      • Pixel drawing
      • Line drawing (at any direction)
      • Filled rectangles
      • Triangles (Gouraud Shaded)
      • Quadrilaterals
    • Smart Composition
      • One pass composition
      • Hidden sreas are not read from memory
      • YUV layers are automatically processed without intermediate conversion
      • Video scaler
    • Blit support
      • Rotation (any angle)
      • Mirroring (vertical, horizontal)
      • Stretch (independently on x and y axis)
      • Source and/or destination color keying
      • Format conversion
      • RGB and YUV
    • Text rendering supports
      • A1 bitmap
      • A8 bitmap antialiased
      • Subsampled antialiased
    • Color formats
      • 32/16/8 bit RGB with/out alpha
      • GreyScale
      • YUV
    • Full Alpha blending
      • Programmable blending modes including Porter-Duff / DirectFB blending modes
      • Source/Destination color keying
    • Image transformation
      • Texture mapping

        • Point sampling
        • Bilinear filtering
      • Antialiasing
        • Quadrilaterals per edge Antialiasing
        • Antialiased lines

    Configuration Options

    • IOMMU
    • Framebuffer/Texture Compression
    • 4bpp or 6bpp
    • Master Interface
      • AMBA AHB 32bit ,
      • AMBA AXI4 32/ 64bit
    • Slave Interface
      • AMBA AHB/AXI4-Lite
  • NEMA|p supports FreeRTOS and Linux operating systems and is shipped with Software Libraries for 2D Graphics APIs. DirectFB and uGFX (FreeRTOS) support makes it an ideal solution for software development with application and Graphic User Interface (GUI) creation frameworks, such as Qt and GTK+. A bare metal library of primitive graphics functions enables graphics development for embedded applications.

    The software package comes together with a "Texture/ Frame-buffer compression scheme-Emulator" (emulators are available upon request).

    • OS support

      • FreeRTOS V8.0.1
      • Linux kernel 3.x
    • Graphics API support
      • Software library in portable ANSI C
      • uGFX
      • DirectFB-1.7.4 GFX Drivers
      • Qt-4.8.6 application framework
    • Software Emulators and suites
      • TSFBc bit accurate emulator
  • NEMA|p delivers stunning performance per silicon area per clock frequency.
    pico 2D GPU NEMA|p100
    GPU cores 1
    Silicon area (mm2@28nm) 0.07
    Core clock (MHz @28nm) 150
    Shader (GOPS) 1.8
    Pixel Rate (Mpixel/sec) 150

    NEMA|p running DirectFB demos

For additional information, download the NEMA|p Product Brief adobe-pdf-document-icon