NeuronFlow™

Leveraging Sparsity and Dataflow for Fastest Edge AI per Watt

Request Whitepaper

Sparsity

With the right architecture, we can additively exploit four kinds of sparsity to save time and energy in computation.

Sparsity in Connectivity

Deep neural networks have lots of connections, but they are not all equal; usually a small number of links is responsible for the core of the computation. By identifying the key connections and processing only those parts of the network, we avoid large numbers of unnecessary computations.

Sparsity in Space

We need lots of pixels to give us the high resolution required for real intelligent vision. However, we don’t need, or use, that high resolution everywhere in the image, but only in the small zones where fine detail is required. By ignoring the massive number of pixels which don’t tell us anything new, we reduce the amount of computation by a huge factor.

Sparsity in Time

A smart doorbell, even upon waking up by a smart trigger, spends more than 95% of its time with nothing to do and no one new to look at.  By recognizing this, and not computing when nothing new is happening, we save an enormous amount of power or infer blindingly fast.

Sparsity in Activation

For any single deep neural network decision, only about 40% of the neurons actually “fire” or have a non-zero output. If the output doesn’t count, we don’t compute its effect in the rest of the network, thereby saving another 60% of energy by avoiding irrelevant computation.

Silicon

GML hardware combines two of the most exciting recent developments in computer architectures – neuromorphic engineering, and Dataflow computation – to implement some of the most efficient digital hardware ever developed.

NeuronFlow™

Neuromorphic Engineering

Dataflow Computation

A key feature of GML chips is that computation happens in the same block of silicon where weights and data are stored – computation near memory with 16-bit FP precision. This means that very little power and time are wasted in bringing together the data and the computation resulting in highly accurate and low power inferences for your endpoint devices.