There's only one way to save energy in edge computation in the real world: perform just the essential computations efficiently, and nothing else. This requires a different way of thinking from the standard approach. The key is sparsity.
Sparsity is the idea that changes in the real world don’t happen everywhere, or all at once. By identifying where the changes happen and computing only the effects and consequences of those changes, we can save up to 95% of the power normally used in processing.
With the right architecture, we can additively exploit four kinds of sparsity to save time and energy in computation.
Deep neural networks have lots of connections, but they are not all equal; usually a small number of links is responsible for the core of the computation. By identifying the key connections and processing only those parts of the network, we avoid large numbers of unnecessary computations.
A smart doorbell, even upon waking up by a smart trigger, spends more than 95% of its time with nothing to do and no one new to look at. By recognizing this, and not computing when nothing new is happening, we save an enormous amount of power or infer blindingly fast.
GML hardware combines two of the most exciting recent developments in computer architectures – neuromorphic engineering, and Dataflow computation – to implement some of the most efficient digital hardware ever developed.
A key feature of GML chips is that computation happens in the same block of silicon where weights and data are stored – computation in memory. This means that very little power and time are wasted in bringing together the data and the computation.
As a user, GrAI Flow™, GML’s Software Development Kit hides all the complexity for you. You design and train your network in a familiar tool such as TensorFlow or PyTorch, and the GML SDK will translate it into code that can run on our super-efficient hardware.