A 3D GPU suitable for Wearables/IoT applications

  • NEMA|t is the industry's smallest Internet-of-Things (IoT) Graphics Processor Unit (GPU) with 3D functionality. The architecture of NEMA|t has been specifically designed from bottom-up for the new generation of superior wearable and IoT display products which require great graphics quality and performance and ultra-low power consumption.

    The incredibly small silicon footprint of 0.1mm2 (400MHz @ 28nm) has leakage power GPU consumption of just 0.07mW. Implementing Think Silicon's proprietary compression technologies limits memory power consumption to just 0.03mW (in DDR-less systems).

    NEMA|t features OpenGL® ES capabilities, implements a fully configurable and programmable 3D graphics rendering engine, accelerates a comprehensive super-set of 2D graphics drawings, and has smart composition functions.

    Application and Markets

    NEMA|t is, designed to support midrange till higher level quality wearable and IoT devices, such as smart watches, health and fitness applications, smart accessories, alarm systems, home automation, embedded-platforms etc. sporting an SoC with 32-bit MCU or MPU (e.g. ARM® Cortex®-M and A processors).

    With NEMA|t, you are be able to create compelling 3D Graphical User Interfaces (GUIs) and software applications with ultra-long battery life or lower power consumption at a significantly lower cost for power-memory-area constrained IoT devices.

    For example, with core frequencies just as little as 80MHz, NEMA|t delivers a fast and brilliant 3D UI experience in 420x420 resolution, without being limited to these parameters.

    Performance per mWatt per Dollar

    The NEMA-GPU Series delivers stunning performance per silicon area per clock frequency. The NEMA|t has been designed to perform favorably against these critical performance benchmarks. As a result NEMA uses 87% less active and 98% less idle power and has a 4 times smaller silicon footprint, leading to significant cost reduction of about 88% per chip compared to the best solution available in the market. Targeting to the system level power consumption reduction, the design is complemented by high-quality 6bpp (bits-per-pixel) texture compression, a real-time 4bpp frame-buffer compression and Z-buffer compression techniques. Compression yields an enormous reduction of the power-hungry memory accesses and offloads system bus. Furthermore, it enables systems that use only internal on-chip memory eliminating the need for an external DDR. All those features lead to a massive battery life extension (up to 10 times) and to a significant cost reduction (BOM). The combined performance and cost advantages make NEMA|t to a Performance-Power-Cost leader in the class of 3D GPU's.

    Architecture

    NEMA|t is a modular architecture and is available as one, two or four-core configurations. Its fixed point data path and instruction set architecture (ISA) are tailored to 3D GUIs acceleration and small display applications leading to substantial improvements in power consumption and silicon area.

    NEMA|t core includes VLIW and vector data processing and innovative low-level lightweight multi-threading for full hardware utilization, the key parameter for ultra-low power consumption and high performance by hiding the memory latency.

    NEMA|t features a smart IOMMU for easy integration while eliminating the unnecessary data traffic between host CPU and NEMA|t.

    Integration/verification

    The NEMA|t GPU IP Platform is available in Verilog and easy to integrate and verify. NEMA|t ASIC reference designs have been evaluated in various process technologies. NEMA|t is designed with AMBA interfaces (AHB, AXI 32 or 64 bits) and embeds command lists for minimal CPU overhead. The core has been verified through extensive simulation and rigorous code coverage measurements. It comes together with a complete verification suite that compares reference images with rendered images.

    Deliverables and Documentation

    The deliverables include: reference design systems and demos for different platforms (Xilinx Zynq, Altera SoCkit), a complete set of synthesis and STA (Static Timing Analysis) scripts, OS drivers (for Linux, Android), graphics API software libraries (for DirectFB, Qt) and standalone bare drivers.

    Documentation includes: IP manual, integration manual, software library manual including code samples, and demonstration platform application notes.

    OpenGL is a registered trademark and the OpenGL ES logo is a trademark of Silicon Graphics International used under license by Khronos.

    • Fully programmable engine with a VLIW instruction set
    • Scalable to multiple cores
    • Command list based DMAs to minimize CPU overhead
    • 3D Rendering
      • Support for OpenGL®
      • Perspective correct image projection
      • Bilinear filtering
      • Z-Buffer(Early test, Late test)
    • Compression schemes
      • Framebuffer: 4bpp
      • Texture: 6bpp with/out alpha
      • Z-buffer
    • 2D drawing
      • Pixel drawing
      • Line drawing (at any direction)
      • Filled rectangles
      • Triangles (Gouraud Shaded)
      • Quadrilaterals
      • Configurable clipping rectangle
    • Smart Composition
      • One pass composition
      • Hidden areas are not read from memory
      • YUV layers are automatically processed without intermediate conversion
      • Video scaler
    • Blit Support
      • Rotation (any angle)
      • Mirroring (vertical, horizontal)
      • Stretch (independently on x and y axis)
      • Source and/or destination color keying
      • Format conversion
      • RGB and YUV
    • Text rendering supports
      • A1 bitmap
      • A8 bitmap antialiased
      • Subsampled antialiased
    • Color formats
      • 32/16/8 bit RGB with/out alpha
      • GreyScale
      • YUV
    • Full Alpha blending
      • Porter-Duff / DirectFB blending modes
      • Source/Destination color keying
    • Image transformation
      • Texture mapping
      • Point sampling
      • Bilinear filtering
      • Texture caching

     

    Configuration Options

    • # of Cores
    • # of Threads
    • IOMMU
    • Framebuffer/Texture Compression
    • 4bpp or 6bpp
    • Z-buffer compression
    • Cache Sizes
    • Texture/ Z buffer
    • Master Interface
      • AMBA AHB 32bit
      • AMBA AXI4 32/ 64bit
    • Slave Interface
      • AMBA AHB/AXI4-Lite
  • NEMA|t supports all major IoT operating systems and middleware like FreeRTOS, Linux and Android and come together with Software Libraries for 3D Graphics APIs. DirectFB support makes it ideal for software development with application and Graphic User Interface (GUI) creation frameworks, such as Qt and GTK+. A bare metal library of primitive graphics functions enables graphics development for embedded applications.

    The software package comes together with a "Texture/frame-buffer compression scheme-Emulator".

    • OS support

      • Linux driver 3.x
      • Android 4.x
    • Graphics API support
      • Software library in portable ANSI C
      • DirectFB-1.7.4 GFX Drivers
      • Qt-4.8.6
    • Software Emulators and suites
      • Software Emulator
  • NEMA|t delivers stunning performance per silicon area per clock frequency.
    tiny 3D GPU NEMA|t100 NEMA|t200 NEMA|t400
    GPU cores 1 2 4
    Silicon area (mm2 @28nm) 0.1 0.15 0.25
    Core clock (MHz @28nm) 400 400 400
    Shader (GOPS) 4.8 9.6 19.2
    Pixel Rate (Mpixel/sec) 400 800 1600

    Below you can see the performance of NEMA|t running 3D demos @92MHz:

For additional information, download the NEMA|t Product Brief adobe-pdf-document-icon