Tensor decomposition is a prominent technique
for analyzing multi-attribute data and is being increasingly
used for data analysis in different application areas. Tensor
decomposition methods are computationally intense and often
involve irregular memory accesses over large-scale sparse data.
Hence it becomes critical to optimize the execution of such data
intensive computations and associated data movement to reduce
the eventual time-to-solution in data analysis applications. With
the prevalence of using advanced high-performance computing
(HPC) systems for data analysis applications, it is becoming
increasingly important to provide fast and scalable implementation
of tensor decompositions and execute them efficiently on
modern and advanced HPC systems. In this paper, we present
distributed tensor decomposition methods that achieve faster,
memory-efficient, and communication-reduced execution on HPC
systems. We demonstrate that our techniques reduce the overall
communication and execution time of tensor decomposition
methods when they are used for analyzing datasets of varied
size from real application. We illustrate our results on HPE
Superdome Flex server, a high-end modular system offering
large-scale in-memory computing, and on a distributed cluster
of Intel Xeon multi-core nodes.
For information on Reservoir’s technology related to this paper, visit ENSIGN.