Polyhedral Tensor Schedulers

Publication Source: 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland

Compiler optimizations based on the polyhedral model are able to automatically parallelize and optimize loopbased code. We acknowledge that while polyhedral techniques can represent a broad set of program transformations, important classes of programs could be parallelized just as well using less general but more tractable techniques. We apply this general idea to the polyhedral scheduling phase, which is one of the typical performance bottlenecks of polyhedral compilation. We focus on a class of programs in which enough parallelism is already exposed in the source program, and which includes Deep Learning layers and combinations thereof, as well as multilinear algebra kernels. We call these programs ”tensor codes”, and consequently call ”tensor schedulers” the tractable polyhedral scheduling techniques presented here. The general idea is that we can significantly speed up polyhedral scheduling by restricting the set of transformations considered. As an extra benefit, having a small search space allows us to introduce non-linear cost models, which fills a gap in polyhedral cost models.

On the Bottleneck Structure of Positive Linear Programming

Publication Source: 2019 SIAM Workshop on Network Science

Positive linear programming (PLP), also known as packing and covering linear programs, is an important class of problems frequently found in fields such as network science, operations research, or economics. In this work we demonstrate that all PLP problems can be represented using a network structure, revealing new key insights that lead to new polynomial-time algorithms.  

Google Scholar    Article

Selective Packet Capture at High Speed Rates

Publication Source: Zeek (Bro) Workshop 2019, CERN, Switzerland

Full packet capture (FPC) consists in capturing all packets and storing them into permanent storage to enable offline forensic analysis. FPC however suffers from a scalability issue: at today's normal traffic speed rates of 10Gbps or above, it either becomes intractable or requires highly expensive hardware both in processing and storage, which rapidly decreases the economic viability of the technology.

The first good news is that for many practical cases, full packet capture is not necessary. This rationale stems from the well-known law of heavy tailed traffic: from an analysis standpoint, most of the interesting features found in network traffic—such as a network attack, although not limited to it—are found in a very small fraction of it. Further, in some cases full packet capture is not only unnecessary but could represent a liability as sensitive information is kept in non-ephemeral storage. The second good news is that all the heavy lifting done by Zeek in processing network traffic can be leveraged to overcome both the intractability and the liability problems. Indeed, Zeek can be brought into the loop to perform selective packet capture (SPC), a process by which the Zeek workers themselves decide which traffic must be stored into disk in a selective and fine granular manner.

In this talk Reservoir Labs will present a workflow to perform selective packet capture using the Zeek sensor at very high speed rates. The workflow allows Zeek scripts to directly trigger packet captures based on the real time analysis of the traffic itself. We will describe key data structures needed to efficiently perform this task and introduce several Zeek scripts and use cases illustrating how SPC can be used to capture just the necessary packets to enable meaningful forensic analysis while minimizing the exposure to the liability risk.

Contact us to receive a copy of this presentation or for a demonstration.

Enhancing Network Visibility and Security through Tensor Analysis

Publication Source: Elsevier, Future Generation Computer Systems Volume 96, July 2019, Pages 207-215

The increasing size, variety, rate of growth and change, and complexity of network data has warranted advanced network analysis and services. Tools that provide automated analysis through traditional or advanced signature-based systems or machine learning classifiers suffer from practical difficulties. These tools fail to provide comprehensive and contextual insights into the network when put to practical use in operational cyber security. In this paper, we present an effective tool for network security and traffic analysis that uses high-performance data analytics based on a class of unsupervised learning algorithms called tensor decompositions. The tool aims to provide a scalable analysis of the network traffic data and also reduce the cognitive load of network analysts and be network-expert-friendly by presenting clear and actionable insights into the network.

In this paper, we demonstrate the successful use of the tool in two completely diverse operational cyber security environments, namely, (1) security operations center (SOC) for the SCinet network at SC16 - The International Conference for High Performance Computing, Networking, Storage and Analysis and (2) Reservoir Labs’ Local Area Network (LAN). In each of these environments, we produce actionable results for cyber security specialists including (but not limited to) (1) finding malicious network traffic involving internal and external attackers using port scans, SSH brute forcing, and NTP amplification attacks, (2) uncovering obfuscated network threats such as data exfiltration using DNS port and using ICMP traffic, and (3) finding network misconfiguration and performance degradation patterns.

Google Scholar    Article

Analysis of Explicit vs. Implicit Tasking in OpenMP using Kripke

Publication Source: 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)

Dynamic task-based parallelism has become a widely-accepted paradigm in the quest for exascale computing. In this work, we deliver a non-trivial demonstration of the advantages of explicit over  implicit tasking in OpenMP 4.5 in terms of both expressiveness and performance. We target the Kripke benchmark, a mini-application used to test the performance of discrete particle codes, and find that the dependence structure of the core “sweep” kernel is well-suited for dynamic task-based systems. Our results show that explicit tasking delivers a 31.7% and 8.1% speedup over a pure implicit implementation for a small and large problem, respectively, while a hybrid variant also underperforms the explicit variant by 13.1% and 5.8%, respectively.
Google Scholar    Article

1 2 3 21