Leveraging Sparsity and Dataflow for Fastest Edge AI per Watt


With the right architecture, we can additively exploit four kinds of sparsity to save time and energy in computation.

Sparsity in Connectivity

Deep neural networks have lots of connections, but they are not all equal; usually a small number of links is responsible for the core of the computation. By identifying the key connections and processing only those parts of the network, we avoid large numbers of unnecessary computations.

Sparsity in Space

We need lots of pixels to give us the high resolution required for real intelligent vision. However, we don’t need, or use, that high resolution everywhere in the image, but only in the small zones where fine detail is required. By ignoring the massive number of pixels which don’t tell us anything new, we reduce the amount of computation by a huge factor.

Sparsity in Time

A smart doorbell, even upon waking up by a smart trigger, spends more than 95% of its time with nothing to do and no one new to look at.  By recognizing this, and not computing when nothing new is happening, we save an enormous amount of power or infer blindingly fast.

Sparsity in Activation

For any single deep neural network decision, only about 40% of the neurons actually “fire” or have a non-zero output. If the output doesn’t count, we don’t compute its effect in the rest of the network, thereby saving another 60% of energy by avoiding irrelevant computation.


GML hardware combines two of the most exciting recent developments in computer architectures – neuromorphic engineering, and Dataflow computation – to implement some of the most efficient digital hardware ever developed.


Neuromorphic Engineering

Dataflow Computation

A key feature of GML chips is that computation happens in the same block of silicon where weights and data are stored – computation in memory. This means that very little power and time are wasted in bringing together the data and the computation.

Algorithms and Software

As a user, GrAI Flow™, GML’s Software Development Kit hides all the complexity for you. You design and train your network in a familiar tool such as TensorFlow or PyTorch, and the GML SDK will translate it into code that can run on our super-efficient hardware.


Unleash the lowest latency per Watt for your edge AI devices

Contact Us