A 3D GPU suitable for Wearables/IoT applications

  • NEMA®|t is the industry's smallest Internet-of-Things (IoT) Graphics Processor Unit (GPU) with 3D functionality. The architecture of NEMA®|t has been specifically designed from bottom-up for the new generation of superior wearable and IoT display products which require great graphics quality and performance and ultra-low power consumption.

    The incredibly small silicon footprint of 0.258mm2 (400MHz @ TSMC 28nm HPLP) has leakage power GPU consumption of just 0.23mW. Implementing Think Silicon's proprietary compression technologies limits memory power consumption to just 0.03mW (in DDR-less systems).

    NEMA®|t features industry standard Open Graphics APIs, implements a fully configurable and programmable 3D graphics rendering engine, accelerates a comprehensive super-set of 2D graphics drawings, and has smart composition functions.

    Application and Markets

    NEMA®|t is, designed to support mid-range till higher level quality wearable and IoT devices, such as smart watches, health and fitness applications, smart accessories, alarm systems, home automation, embedded-platforms, etc. sporting an SoC with 32-bit MCU or MPU (e.g. ARM® Cortex®-M and A processors).

    With NEMA®|t, you are be able to create compelling 3D Graphical User Interfaces (GUIs) and software applications with ultra-long battery life or lower power consumption at a significantly lower cost for power-memory-area constrained IoT devices.

    For example, with core frequencies just as little as 80MHz, NEMA®|t delivers a fast and brilliant 3D UI experience in 420x420 resolution, without being limited to these parameters.

    Performance per mWatt per Dollar

    The NEMA®|GPU-Series delivers stunning performance per silicon area per clock frequency. The NEMA®|t has been designed to perform favorably against these critical performance benchmarks. As a result NEMA® uses 87% less active and 98% less idle power and has a 4 times smaller silicon footprint, leading to significant cost reduction of about 88% per chip compared to the best solution available in the market. Targeting to the system level power consumption reduction, the design is complemented by high-quality 6bpp (bits-per-pixel) texture compression (TSCTMT), a real-time 4bpp frame-buffer compression (TSCTMFB) and Z-buffer compression techniques. Compression yields an enormous reduction of the power-hungry memory accesses and offloads system bus. Furthermore, it enables systems that use only internal on-chip memory eliminating the need for an external DDR. All those features lead to a massive battery life extension (up to 10 times) and to a significant cost reduction (BOM). The combined performance and cost advantages make NEMA®|t to a Performance-Power-Cost leader in the class of 3D GPU's.


    NEMA®|t is a modular architecture and is available as one, two or four-core configurations. Its fixed point data path and instruction set architecture (ISA) are tailored to 3D GUIs acceleration and small display applications leading to substantial improvements in power consumption and silicon area.

    NEMA®|t core includes VLIW and vector data processing and innovative low-level lightweight multi-threading for full hardware utilization, the key parameter for ultra-low power consumption and high performance by hiding the memory latency.

    NEMA®|t features a smart IOMMU for easy integration while eliminating the unnecessary data traffic between host CPU and NEMA®|t.


    The NEMA®|t GPU IP Platform is available in Verilog and easy to integrate and verify. NEMA®|t ASIC reference designs have been evaluated in various process technologies. NEMA®|t is designed with AMBA interfaces (AHB, AXI 32 or 64 bits) and embeds command lists for minimal CPU overhead. The core has been verified through extensive simulation and rigorous code coverage measurements. It comes together with a complete verification suite that compares reference images with rendered images.

    Deliverables and Documentation*

    The deliverables include: reference design systems and demos for different platforms (Xilinx Zynq, Altera SoCkit), a complete set of synthesis and STA (Static Timing Analysis) scripts, OS drivers (for Linux, Android), graphics API software libraries (for DirectFB, Qt) and standalone bare drivers.

    Documentation includes: IP manual, integration manual, software library manual including code samples, and demonstration platform application notes.

    Listed items re-presenting a super-set and are subject to change without further notice.
    Listed items could be a part of a unified product part number and may or may not be listed under a separate part number.
    Listed items are not subject of an official quote unless listed in such.

    • Fully programmable engine with a VLIW instruction set
    • Scalable to multiple cores
    • DMA based Command lists to minimize CPU overhead
    • 3D Rendering
      • Support for Industry  standard  Open Graphics APIs
      • Perspective correct image projection
      • Z-Buffer(Early test, Late test)
    • Compression schemes
      • TSCTM4 (4 bits per pixel)
      • TSCTM6/TSCTM6a (6 bits per pixel without/with alpha)
      • Z-buffer compression
    • 2D drawing
      • Pixel / Line drawing
      • Filled rectangles
      • Triangles (Gouraud Shaded)
      • Quadrilaterals
    • Smart Composition
      • One pass composition
      • Hidden areas are not read from memory
      • YUV layers are automatically processed without intermediate conversion
      • Video scaler
    • Blit Support
      • Rotation / Mirroring
      • Stretch (independently on x and y axis)
      • Source and/or destination color keying
      • Format conversion
      • RGB and YUV
      • Texture Wrapping (mirror, repeat, clamp, border color)
    • Text rendering supports
      • A1 bitmap, A8 bitmap antialiased
      • Subsampled antialiased
    • Color formats
      • 32/16/8 bit. with/out alpha, GreyScale, YUV, RGB, TSCTM4, TSCTM6/TSCTM6a
    • Full Alpha blending
      • Programmable blending modes
      • Source/Destination color keying
    • Texture mapping
      • Point sampling
      • Bilinear filtering
      • Texture caching

    Configuration Options

    • # of Cores
    • # of Threads
    • IOMMU
    • Caches
    • Texture/Z-buffer
    • Framebuffer/Texture Compression
      • TSCTM4, TSCTM6, TSCTM6a
    • Z-buffer compression
    • Master Interface
      • AMBA AHB 32bit
      • AMBA AXI4 32/ 64bit
    • Slave Interface
      • AMBA AHB/AXI4-Lite

  • NEMA®|t supports all major IoT operating systems and middleware like RTOS, Linux and Android and come together with Software Libraries for 3D Graphics APIs. DirectFB support makes it ideal for software development with application and Graphic User Interface (GUI) creation frameworks, such as Qt and GTK+. A bare metal library of primitive graphics functions enables graphics development for embedded applications.

    The software package comes together with a "Texture/frame-buffer compression scheme-Emulator".

    • OS support
      • RTOS
      • BareMetal
      • Linux
      • Android
    • Graphics API support
      • NEMA®|GFX-API
      • DirectFB, Qt
      • Industry standard Open Graphics APIs
    • Software Emulators and Tools
      • NEMA®|SHADER-Edit
      • NEMA®|GUI-Builder*
      • NEMA®|PIX-Presso
      • NEMA®|Bits
      • NEMA®|Profiler
      • TSCTMFB bit accurate emulator

    * NEMA|GUI-Builder (non-commercial version)

    • Graphics API library options:
      • Performance build: No logs support
      • Debug build: logs enabled

  • NEMA®|t delivers stunning performance per silicon area per clock frequency.

    tiny 3D GPU NEMA®|t-100 NEMA®|t-200 NEMA®|t-400
    GPU cores 1 2 4
    Silicon area (mm2 @28nm) 0.258 0.412 0.616
    Core clock (MHz @28nm) 400 400 400
    Shader (GOPS) 4.8 9.6 19.2
    Pixel Rate (Mpixel/sec) 400 800 1600

    Below you can see the performance of NEMA®|t running 3D demos @92MHz:

For additional information, download the NEMA®|t Product Brief adobe-pdf-document-icon