Submitted on 09 Jun 2020
Automated Design Space Exploration for optimised Deployment of DNN on
Arm Cortex-A CPUs
Miguel de Prado, Andrew Mundy, Rabia Saeed, Maurizio Denna, Nuria Pazos, Luca Benini
The spread of deep learning on embedded devices has prompted the development
of numerous methods to optimise the deployment of deep neural networks (DNN).
Works have mainly focused on: i) efficient DNN architectures, ii) network
optimisation techniques such as pruning and quantisation, iii) optimised
algorithms to speed up the execution of the most computational intensive layers
and, iv) dedicated hardware to accelerate the data flow and computation.
However, there is a lack of research on cross-level optimisation as the space
of approaches becomes too large to test and obtain a globally optimised
solution. Thus, leading to suboptimal deployment in terms of latency, accuracy,
and memory. In this work, we first detail and analyse the methods to improve
the deployment of DNNs across the different levels of software optimisation.
Building on this knowledge, we present an automated exploration framework to
ease the deployment of DNNs. The framework relies on a Reinforcement Learning
search that, combined with a deep learning inference framework, automatically
explores the design space and learns an optimised solution that speeds up the
performance and reduces the memory on embedded CPU platforms. Thus, we present
a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU
platforms achieving up to 4x improvement in performance and over 2x reduction
in memory with negligible loss in accuracy with respect to the BLAS
floating-point implementation.
https://arxiv.org/abs/2006.05181