Network-on-Chip Architecture, TSNoC
The NEMA-GPU series is powered by an adaptive ultra low power, network-on-chip architecture (TSNoC) explicitly optimized for the memory traffic and memory patterns of NEMA GPUs.
TSNoC efficiently handles
- the memory transactions generated by the highly multithreaded NEMA GPUs to the attached AXI memory ports (connected directly to memory controller or to any other AXI-compatible medium)
- the inter-core traffic generated by synchronization operations.
Incoming AXI transactions semantics are transformed in the network interfaces to a packetized flow-control protocol used inside the TSNoC for fast data delivery and QoS switching. The TSNoC-internal, proprietary protocol satisfies in an adaptive manner the bandwidth and latency requirements of NEMA applications and allows our GPUs to scale to an arbitrary number of cores. The TSNoC internal protocol is the key success of NEMA interconnection because it offers an effective mean to handle and optimize the different volume and types of memory requests generated by our multicore GPU (geometry, z- and texture data etc.) in a seamless fashion.
TSNoC is fully configurable at design time including multiple customization features:
- High-speed operation @GHz level with low-cost buffering/pipelining customized for graphics workloads.
- Automatic data width configuration both at the networks input-output ports and internally at the byte level allowing flexible bandwidth calibration.
- (optional) Integrated clock domain crossing blocks at the memory ports allowing to attach NEMA processor at any clock-frequency AXI ports.
- Customizable communication latency from the Nema cores to the memory ports allowing dynamic calibration of the number of GPU threads.
- Customizable number of outstanding transactions per core.
- Load balanced routing and switching of the memory traffic to an arbitrary number of memory ports.
- Easily extendable to other, not AXI protocols, by selecting a different network interface architecture.
The tightly-coupled integration of NEMA cores with TSNoC simplifies the integration of our IP at system level and eases the design/verification stage. At the physical level, chip-level integration is oblivious of the potential routing congestion in a NEMA-based graphics subsystem since local routing congestion is efficiently addressed by the TSNoC in a scalable manner.