On the Bottleneck Structure of Congestion-Controlled Networks

Publication Source: ACM SIGMETRICS 2020, Boston MA; ACM Proceedings on Measurement and Analysis of Computing Systems (POMACS), December 2019

In this paper, we introduce the Theory of Bottleneck Ordering, a mathematical framework that reveals the bottleneck structure of data networks. This theoretical framework provides insights into the inherent topological properties of a network at least in three areas: (1) It identifies the regions of influence of each bottleneck; (2) it reveals the order in which bottlenecks (and flows traversing them) converge to their steady state transmission rates in distributed congestion control algorithms; and (3) it provides key insights to the design of optimized traffic engineering policies. We demonstrate the efficacy of the proposed theory in TCP congestion-controlled networks for two broad classes of algorithms: congestion-based algorithms (BBR) and loss-based additive-increase/multiplicative-decrease algorithms (Cubic and Reno). Among other results, our network experiments show that: (1) Qualitatively, both classes of congestion control algorithms behave as predicted by the bottleneck structure of the network; (2) flows compete for bandwidth only with other flows operating at the same bottleneck level; (3) BBR flows achieve higher performance and fairness than Cubic and Reno flows due to their ability to operate at the right bottleneck level; (4) the bottleneck structure of a network is ever-changing and its levels can be folded due to variations in the flows’ round trip times; and (5) against conventional wisdom, low-hitter flows can have a large impact to the overall performance of a network. This paper has not yet been published. If you would like to receive a copy once available, please contact us or check back here in December.

G2: A Network Optimization Framework for High-Precision Analysis of Bottleneck and Flow Performance

Publication Source: 2019 IEEE/ACM SuperComputing Conference, Innovating the Network for Data-Intensive Science (INDIS) Workshop, Denver, CO

Congestion control algorithms for data networks have been the subject of intense research for the last three decades. While most of the work has focused around the characterization of a flow’s bottleneck link, understanding the interactions amongst links and the ripple effects that perturbations in a link can cause on the rest of the network has remained much less understood. The Theory of Bottleneck Ordering is a recently developed mathematical framework that reveals the bottleneck structure of a network and provides a model to understand such effects. In this paper we present G2, the first operational network optimization framework that utilizes this new theoretical framework to characterize with high-precision the performance of bottlenecks and flows. G2 generates an interactive graph structure that describes how perturbations in links and flows propagate, providing operators new optimization insights and traffic engineering recommendations to help improve network performance. We provide a description of the G2 implementation and a set of experiments using real TCP/IP code to demonstrate its operational efficacy.

This paper has not yet been published. If you would like to receive a copy once available, please contact us or check back here after the 2019 SuperComputing Conference. For more information on G2 technology, visit https://www.reservoir.com/research/tech/networking/.

Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale

Publication Source: IEEE High Performance Extreme Computing Conference (HPEC) 2019, Waltham, MA

This paper describes MADHAT (Multidimensional Anomaly Detection fusing HPC, Analytics, and Tensors), an integrated workflow that demonstrates the applicability of HPC resources to the problem of maintaining cyber situational awareness. MADHAT combines two high-performance packages: ENSIGN for large-scale sparse tensor decompositions and HAGGLE for graph analytics. Tensor decompositions isolate coherent patterns of network behavior in ways that common clustering methods based on distance metrics cannot. Parallelized graph analysis then uses directed queries on a representation that combines the elements of identified patterns with other available information (such as additional log fields, domain knowledge, network topology, whitelists and blacklists, prior feedback, and published alerts) to confirm or reject a threat hypothesis, collect context, and raise alerts. MADHAT was developed using the collaborative HPC Architecture for Cyber Situational Awareness (HACSAW) research environment and evaluated on structured network sensor logs collected from Defense Research and Engineering Network (DREN) sites using HPC resources at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDC DSRC). To date, MADHAT has analyzed logs with over 650 million entries.


Automatic Parallelization to Asynchronous Task- Based Runtimes Through a Generic Runtime Layer

Publication Source: IEEE High Performance Extreme Computing Conference (HPEC) 2019, Waltham, MA

With the end of Moore’s law, asynchronous taskbased parallelism has seen growing support as a parallel programming paradigm, with the runtime system offering such advantages as dynamic load balancing, locality, and scalability. However, there has been a proliferation of such programming systems in recent years, each of which presents different performance tradeoffs and runtime semantics. Developing applications on top of these systems thus requires not only application expertise but also deep familiarity with the runtime, exacerbating the perennial problems of programmability and portability. This work makes three main contributions to this growing landscape. First, we extend a polyhedral optimizing compiler with techniques to extract task-based parallelism and data management for a broad class of asynchronous task-based runtimes. Second, we introduce a generic runtime layer for asynchronous task-based systems with representations of data and tasks that are sparse and tiled by default, which serves as an abstract target for the compiler backend. Finally, we implement this generic layer using OpenMP and Legion, demonstrating the flexibility and viability of the generic layer and delivering an end-to-end path for automatic parallelization to asynchronous task-based runtimes. Using a wide range of applications from deep learning to scientific kernels, we obtain geometric mean speedups of 23.0 (OpenMP) and 9.5 (Legion) using 64 threads.

Fast and Scalable Distributed Tensor Decompositions

Publication Source: IEEE High Performance Extreme Computing Conference (HPEC) 2019, Waltham, MA

Tensor decomposition is a prominent technique for analyzing multi-attribute data and is being increasingly used for data analysis in different application areas. Tensor decomposition methods are computationally intense and often involve irregular memory accesses over large-scale sparse data. Hence it becomes critical to optimize the execution of such data intensive computations and associated data movement to reduce the eventual time-to-solution in data analysis applications. With the prevalence of using advanced high-performance computing (HPC) systems for data analysis applications, it is becoming increasingly important to provide fast and scalable implementation of tensor decompositions and execute them efficiently on modern and advanced HPC systems. In this paper, we present distributed tensor decomposition methods that achieve faster, memory-efficient, and communication-reduced execution on HPC systems. We demonstrate that our techniques reduce the overall communication and execution time of tensor decomposition methods when they are used for analyzing datasets of varied size from real application. We illustrate our results on HPE Superdome Flex server, a high-end modular system offering large-scale in-memory computing, and on a distributed cluster of Intel Xeon multi-core nodes.

1 2 3 23