November 11, 2018
The problem of elephant flow detection is a longstanding research area with the goal of quickly identifying flows in a network that are large enough to affect the quality of service of smaller flows. Past work in this field has largely been either domain-specific, based on thresholds for a specific flow size metric, or required several hyperparameters, reducing their ease of adaptation to the great variety of traffic distributions present in real-world networks. In this paper, we present an approach to elephant flow detection that avoids these limitations, utilizing the rigorous framework of Bayesian inference. By observing packets sampled from the network, we use Dirichlet-Categorical inference to calculate a posterior distribution that explicitly captures our uncertainty about the sizes of each flow. We then use this posterior distribution to find the most likely subset of elephant flows under this probabilistic model. Our algorithm rapidly converges to the optimal sampling rate at a speed O(1/n), where n is the number of packet samples received, and the only hyperparameter required is the targeted detection likelihood, defined as the probability of correctly inferring all the elephant flows. Compared to the state-of-the-art based on static sampling rate, we show a reduction in error rate by a factor of 20 times. The proposed method of Dirichlet-Categorical inference provides a novel, powerful framework to elephant flow detection that is both highly accurate and probabilistically meaningful.
View the related slides presented at INDIS 2018.
October 9, 2018
A signal pre-compensation system analyzes one or more properties of a communication medium and, taking advantage of the locality of propagation, generates using sparse fast Fourier transform (sFFT) a sparse kernel based on the medium properties. The system models propagation of data signals through the medium as a fixed-point iteration based on the sparse kernel, and determines initial amplitudes for the data symbol(s) to be transmitted using different communication medium modes. Fixed-point iterations are performed using the sparse kernel to iteratively update the initial amplitudes. If the iterations converge, a subset of the finally updated amplitudes is used as launch amplitudes for the data symbol(s). The data symbol(s) can be modulated using these launch amplitudes such that upon propagation of the pre-compensated data symbol(s) through the communication medium, they would resemble the original data symbols at a receiver, despite any distortion and/or cross-mode interference in the communication medium.
Google Scholar • Article
Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data
September 25, 2018
We present streaming CP update, an algorithmic framework for updating CP tensor decompositions that possesses the capability of identifying emerging components and can produce decompositions of large, sparse tensors streaming along multiple modes at a low computational cost. We discuss a large-scale implementation of the proposed scheme integrated within the ENSIGN tensor analysis package, and we evaluate and demonstrate the performance of the framework, in terms of computational efficiency and capability to discover emerging components, on a real cyber dataset.
September 25, 2018
Multiresolution priority queues are data structures recently discovered by Reservoir Labs that reduce the entropy of some critical graph algorithms—such as Dijkstra’s or Prim’s algorithms—and deliver new lower computational complexity bounds. These new data structures are capable of exploiting the multiresolution properties of discrete algorithms, a characteristic that has been otherwise overlooked in the field of graph algorithms. Similar to the concept of resolution found in signal processing—by which a signal can be undersampled while information loss is zero or very small—graphs’ entropy tends to be concentrated in regions that can be efficiently exploited by multiresolution data structures. In this approach, a small controllable bounded discrete error is introduced in a way that entropy is substantially reduced, resulting in new lower computational complexity algorithms.
While the fastest currently known graph algorithms provide exact solutions at the expense of incurring high computational costs, a multiresolution graph algorithm is capable of softening graph problems and breaking their current information theoretic barriers, introducing a small amount of controlled error in a way that the problem’s entropy is reduced. As a result, a new class of higher performance graph algorithms is enabled, enabling the solution of previously deemed intractable problems by identifying solutions that are close to optimal and within a known bounded error.
Google Scholar • Article
June 19, 2018
In this paper we introduce a new framework to detect elephant flows at very high speed rates and under uncertainty. The framework provides exact mathematical formulas to compute the detection likelihood and introduces a new flow reconstruction lemma under partial information. These theoretical results lead to the design of BubbleCache, a new elephant flow detection algorithm designed to operate near the optimal tradeoff between computational scalability and accuracy by dynamically tracking the traffic’s natural cutoff sampling rate. We demonstrate on a real world 100 Gbps network that the BubbleCache algorithm helps reduce the computational cost by a factor of 1000 and the memory requirements by a factor of 100 while detecting the top flows on the network with very high probability.