Neural Network models got the attention of the scientific community for their
increasing accuracy in predictions and good emulation of some human tasks.
This led to extensive enhancements in their architecture, resulting in models
with fast-growing memory and computation requirements. Due to hardware constraints such as memory and computing capabilities, the inference of a large neural network model can be distributed across multiple devices by a partitioning
algorithm. The proposed framework finds the optimal model splits and chooses
which device shall compute a corresponding split to minimize inference time and
energy. The framework is based on PipeEdge algorithm and extends it by not
only increasing inference throughput but also simultaneously minimizing inference energy consumption. Another thesis contribution is the augmentation of
the emerging technology Compute-in-memory (CIM) devices to the system. To
the best of my knowledge, no one studied the effect of including CIM, specifically DNN+NeuroSim simulator, devices in a distributed inference. My proposed
framework could partition VGG8 and ResNet152 on ImageNet and achieve a comparable trade-off between inference slowest stage increase and energy reduction
when it tried to decrease inference energy (e.g. 19% energy reduction with 34%
time increase) and when CIM devices were augmenting the system (e.g. 34%
energy reduction with 45% time increase).
Date of Award | Jul 2023 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Ahmed Eltawil (Supervisor) |
---|
- Distributed Inference
- Neural Networks
- Machined Learning
- Compute-in-memory
- throughput energy trade-off