Multiple data centers intended for latency minimization using artificial intelligence algorithms
Author(s): Manideep Yenugula, Raghunath Kodam and David He
Abstract: Lightweight deep learning data and algorithms center technology have recently advanced to the point where multiple inferences of models tasks can be executed simultaneously on limited data center resources. This allows us to work together towards a common goal instead of focusing on achieving high quality in each individual task. On the other hand, real-time applications are never good with multi-model inferences because of the high total operating latency. For multi-model deployment, algorithms should be fine-tuned to reduce latency as much as possible without jeopardizing safety-critical scenarios. In this study, we employ an open neural networks exchange (ONNX) execution engine to investigate model inference and develop a real-time job scheduling technique for deploying multiple models. Afterwards, a container-based application deployment approach is suggested, and inference jobs are allocated to various containers according to the scheduling techniques. The suggested technique may drastically cut down on total running delay in real-time applications, according to the experimental findings.
Manideep Yenugula, Raghunath Kodam, David He. Multiple data centers intended for latency minimization using artificial intelligence algorithms. Int J Comput Artif Intell 2020;1(1):39-45. DOI: 10.33545/27076571.2020.v1.i1a.79