## Algorithms

## Fast and Scalable Distributed Tensor Decompositions

September 24, 2019

Tensor decomposition is a prominent technique
for analyzing multi-attribute data and is being increasingly
used for data analysis in different application areas. Tensor
decomposition methods are computationally intense and often
involve irregular memory accesses over large-scale sparse data.
Hence it becomes critical to optimize the execution of such data
intensive computations and associated data movement to reduce
the eventual time-to-solution in data analysis applications. With
the prevalence of using advanced high-performance computing
(HPC) systems for data analysis applications, it is becoming
increasingly important to provide fast and scalable implementation
of tensor decompositions and execute them efficiently on
modern and advanced HPC systems. In this paper, we present
distributed tensor decomposition methods that achieve faster,
memory-efficient, and communication-reduced execution on HPC
systems. We demonstrate that our techniques reduce the overall
communication and execution time of tensor decomposition
methods when they are used for analyzing datasets of varied
size from real application. We illustrate our results on HPE
Superdome Flex server, a high-end modular system offering
large-scale in-memory computing, and on a distributed cluster
of Intel Xeon multi-core nodes.

Article

## Fast Large-Scale Algorithm for Electromagnetic Wave Propagation in 3D Media

September 24, 2019

We present a fast, large-scale algorithm for the
simulation of electromagnetic waves (Maxwell’s equations) in
three-dimensional inhomogeneous media. The algorithm has a
complexity of O(N log(N)) and runs in parallel. Numerical
simulations show the rapid treatment of problems with tens of
millions of unknowns on a small shared-memory cluster ( 16
cores).

Article

## Combinatorial Multigrid: Advanced Preconditioners For Ill-Conditioned Linear Systems

September 24, 2019

The Combinatorial Multigrid (CMG) technique is a
practical and adaptable solver and combinatorial preconditioner
for solving certain classes of large, sparse systems of linear equations.
CMG is similar to Algebraic Multigrid (AMG) but replaces
large groupings of fine-level variables with a single coarse-level
one, resulting in simple and fast interpolation schemes. These
schemes further provide control over the refinement strategies at
different levels of the solver hierarchy depending on the condition
number of the system being solved [1]. While many pre-existing
solvers may be able to solve large, sparse systems with relatively
low complexity, inversion may require O(n2) space; whereas, if
we know that a linear operator has ~n = O(n) nonzero elements,
we desire to use O(n) space in order to reduce communication
as much as possible.
Being able to invert sparse linear systems of equations, asymptotically
as fast as the values can be read from memory, has been
identified by the Defense Advanced Research Projects Agency
(DARPA) and the Department of Energy (DOE) as increasingly
necessary for scalable solvers and energy-efficient algorithms [2],
[3] in scientific computing. Further, as industry and government
agencies move towards exascale, fast solvers and communicationavoidance
will be more necessary [4], [5]. In this paper, we present
an optimized implementation of the Combinatorial Multigrid in
C using Petsc and analyze the solution of various systems using
the CMG approach as a preconditioner on much larger problems
than have been presented thus far. We compare the number of
iterations, setup times and solution times against other popular
preconditioners for such systems, including Incomplete Cholesky
and a Multigrid approach in Petsc against common problems,
further exhibiting superior performance by the CMG. 1 2
Index Terms—combinatorial algorithms, spectral support
solver, linear systems, fast solvers, preconditioners, multigrid,
graph laplacian, benchmarking, iterative solvers

Article

## On the Bottleneck Structure of Positive Linear Programming

May 23, 2019

Positive linear programming (PLP), also known as packing and covering linear programs, is an important class of problems frequently found in fields such as network science, operations research, or economics. In this work we demonstrate that all PLP problems can be represented using a network structure, revealing new key insights that lead to new polynomial-time algorithms.

Google Scholar • Article

## Fast Detection of Elephant Flows with Dirichlet-Categorical Inference

November 11, 2018

The problem of elephant flow detection is a longstanding research area with the goal of quickly identifying flows in a network that are large enough to affect the quality of service of smaller flows. Past work in this field has largely been either domain-specific, based on thresholds for a specific flow size metric, or required several hyperparameters, reducing their ease of adaptation to the great variety of traffic distributions present in real-world networks. In this paper, we present an approach to elephant flow detection that avoids these limitations, utilizing the rigorous framework of Bayesian inference. By observing packets sampled from the network, we use* Dirichlet-Categorical inference* to calculate a posterior distribution that explicitly captures our uncertainty about the sizes of each flow. We then use this posterior distribution to find the most likely subset of elephant flows under this probabilistic model. Our algorithm rapidly converges to the optimal sampling rate at a speed *O(1/n)*, where *n* is the number of packet samples received, and the only hyperparameter required is the targeted detection likelihood, defined as the probability of correctly inferring all the elephant flows. Compared to the state-of-the-art based on static sampling rate, we show a reduction in error rate by a factor of 20 times. The proposed method of Dirichlet-Categorical inference provides a novel, powerful framework to elephant flow detection that is both highly accurate and probabilistically meaningful.

View the related slides presented at INDIS 2018.

Article