April 8, 2019
Full packet capture (FPC) consists in capturing all packets and storing them into permanent storage to enable offline forensic analysis. FPC however suffers from a scalability issue: at today's normal traffic speed rates of 10Gbps or above, it either becomes intractable or requires highly expensive hardware both in processing and storage, which rapidly decreases the economic viability of the technology.
The first good news is that for many practical cases, full packet capture is not necessary. This rationale stems from the well-known law of heavy tailed traffic: from an analysis standpoint, most of the interesting features found in network traffic—such as a network attack, although not limited to it—are found in a very small fraction of it. Further, in some cases full packet capture is not only unnecessary but could represent a liability as sensitive information is kept in non-ephemeral storage. The second good news is that all the heavy lifting done by Zeek in processing network traffic can be leveraged to overcome both the intractability and the liability problems. Indeed, Zeek can be brought into the loop to perform selective packet capture (SPC), a process by which the Zeek workers themselves decide which traffic must be stored into disk in a selective and fine granular manner.
In this talk Reservoir Labs will present a workflow to perform selective packet capture using the Zeek sensor at very high speed rates. The workflow allows Zeek scripts to directly trigger packet captures based on the real time analysis of the traffic itself. We will describe key data structures needed to efficiently perform this task and introduce several Zeek scripts and use cases illustrating how SPC can be used to capture just the necessary packets to enable meaningful forensic analysis while minimizing the exposure to the liability risk.
February 25, 2019
The increasing size, variety, rate of growth and change, and complexity of network data has warranted advanced network analysis and services. Tools that provide automated analysis through traditional or advanced signature-based systems or machine learning classifiers suffer from practical difficulties. These tools fail to provide comprehensive and contextual insights into the network when put to practical use in operational cyber security. In this paper, we present an effective tool for network security and traffic analysis that uses high-performance data analytics based on a class of unsupervised learning algorithms called tensor decompositions. The tool aims to provide a scalable analysis of the network traffic data and also reduce the cognitive load of network analysts and be network-expert-friendly by presenting clear and actionable insights into the network.
In this paper, we demonstrate the successful use of the tool in two completely diverse operational cyber security environments, namely, (1) security operations center (SOC) for the SCinet network at SC16 - The International Conference for High Performance Computing, Networking, Storage and Analysis and (2) Reservoir Labs’ Local Area Network (LAN). In each of these environments, we produce actionable results for cyber security specialists including (but not limited to) (1) finding malicious network traffic involving internal and external attackers using port scans, SSH brute forcing, and NTP amplification attacks, (2) uncovering obfuscated network threats such as data exfiltration using DNS port and using ICMP traffic, and (3) finding network misconfiguration and performance degradation patterns.
Google Scholar • Article
Computationally Efficient CP Tensor Decomposition Update Framework for Emerging Component Discovery in Streaming Data
September 25, 2018
We present streaming CP update, an algorithmic framework for updating CP tensor decompositions that possesses the capability of identifying emerging components and can produce decompositions of large, sparse tensors streaming along multiple modes at a low computational cost. We discuss a large-scale implementation of the proposed scheme integrated within the ENSIGN tensor analysis package, and we evaluate and demonstrate the performance of the framework, in terms of computational efficiency and capability to discover emerging components, on a real cyber dataset.
September 25, 2018
Multiresolution priority queues are data structures recently discovered by Reservoir Labs that reduce the entropy of some critical graph algorithms—such as Dijkstra’s or Prim’s algorithms—and deliver new lower computational complexity bounds. These new data structures are capable of exploiting the multiresolution properties of discrete algorithms, a characteristic that has been otherwise overlooked in the field of graph algorithms. Similar to the concept of resolution found in signal processing—by which a signal can be undersampled while information loss is zero or very small—graphs’ entropy tends to be concentrated in regions that can be efficiently exploited by multiresolution data structures. In this approach, a small controllable bounded discrete error is introduced in a way that entropy is substantially reduced, resulting in new lower computational complexity algorithms.
While the fastest currently known graph algorithms provide exact solutions at the expense of incurring high computational costs, a multiresolution graph algorithm is capable of softening graph problems and breaking their current information theoretic barriers, introducing a small amount of controlled error in a way that the problem’s entropy is reduced. As a result, a new class of higher performance graph algorithms is enabled, enabling the solution of previously deemed intractable problems by identifying solutions that are close to optimal and within a known bounded error.
Google Scholar • Article
June 19, 2018
In this paper we introduce a new framework to detect elephant flows at very high speed rates and under uncertainty. The framework provides exact mathematical formulas to compute the detection likelihood and introduces a new flow reconstruction lemma under partial information. These theoretical results lead to the design of BubbleCache, a new elephant flow detection algorithm designed to operate near the optimal tradeoff between computational scalability and accuracy by dynamically tracking the traffic’s natural cutoff sampling rate. We demonstrate on a real world 100 Gbps network that the BubbleCache algorithm helps reduce the computational cost by a factor of 1000 and the memory requirements by a factor of 100 while detecting the top flows on the network with very high probability.