ENSIGN®Learn more about our cutting-edge hypergraph analysis technology »
Pierre-David Letourneau, Muthu Baskaran, Tom Henretty, James Ezick, Richard LethinSeptember 25, 2018
Publication Source: 2018 IEEE High Performance Extreme Computing Conference (HPEC '18), Waltham, MA, USA [Best Paper Award]
We present streaming CP update, an algorithmic framework for updating CP tensor decompositions that possesses the capability of identifying emerging components and can produce decompositions of large, sparse tensors streaming along multiple modes at a low computational cost. We discuss a large-scale implementation of the proposed scheme integrated within the ENSIGN tensor analysis package, and we evaluate and demonstrate the performance of the framework, in terms of computational efficiency and capability to discover emerging components, on a real cyber dataset.
Aditya Gudibanda, Tom Henretty, Muthu Baskaran, James Ezick, Richard Lethin September 25, 2018
Publication Source: 2018 IEEE High Performance Extreme Computing Conference (HPEC '18), Waltham, MA, USA
As the scale of unlabeled data rises, it becomes increasingly valuable to perform scalable, unsupervised data analysis. Tensor decompositions, which have been empirically successful at finding meaningful cross-dimensional patterns in multidimensional data, are a natural candidate to test for scalability and meaningful pattern discovery in these massive real-world datasets. Furthermore, the production of big data of different types necessitates the ability to mine patterns across disparate sources. The coupled tensor decomposition framework captures this idea by decomposing several tensors from different data sources together. We present a scalable implementation of coupled tensor decomposition on Apache Spark. We introduce nonnegativity and sparsity constraints, and perform all-at-once quasi-Newton optimization of all factor matrix parameters. We present results showing the billion-scale scalability of this novel implementation and also demonstrate the high level of interpretability in the components produced, suggesting that coupled, all-at-once tensor decompositions on Apache Spark represent a promising framework for large-scale, unsupervised pattern discovery.
Thomas Henretty, M. Harper Langston, Muthu Baskaran, James Ezick, Richard LethinMay 9, 2018
Publication Source: SPIE Proceedings Volume 10652, Disruptive Technologies in Information Sciences; 1065208 (2018), doi: 10.1117/12.2306933
Tensor decompositions are a class of algorithms used for unsupervised pattern discovery. Structured, multidimensional datasets are encoded as tensors and decomposed into discrete, coherent patterns captured as weighted collections of high-dimensional vectors known as components. Tensor decompositions have recently shown promising results when addressing problems related to data comprehension and anomaly discovery in cybersecurity and intelligence analysis. However, analysis of Big Data tensor decompositions is currently a critical bottleneck owing to the volume and variety of unlabeled patterns that are produced. We present an approach to automated component clustering and classification based on the Latent Dirichlet Allocation (LDA) topic modeling technique and show example applications to representative cybersecurity and geospatial datasets.
Copyright 2018 Society of Photo-Optical Instrumentation Engineers (SPIE). One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.Google Scholar • Article
Muthu Baskaran, David Bruns-Smith, Thomas Henretty, James Ezick, Richard LethinOctober 13, 2017
Publication Source: The 4th International Workshop on Innovating the Network for Data Intensive Science (INDIS) 2017, Denver, CO, USA.
The increasing size, variety, rate of growth and change, and complexity of network data has warranted advanced network analysis and services. Tools that provide automated analysis through traditional or advanced signature-based systems or machine learning classifiers suffer from practical difficulties. These tools fail to provide comprehensive and contextual insights into the network when put to practical use in operational cyber security. In this paper, we present an effective tool for network security and traffic analysis that uses high-performance data analytics based on a class of unsupervised learning algorithms called tensor decompositions. The tool aims to provide a scalable analysis of the network traffic data and also reduce the cognitive load of network analysts and be network-expert-friendly by presenting clear and actionable insights into the network.
In this paper, we demonstrate the successful use of the tool in two completely diverse operational cyber security environments, namely, (1) security operations center (SOC) for the SCinet network at SC16 - The International Conference for High Performance Computing, Networking, Storage and Analysis and (2) Reservoir Labs’ Local Area Network (LAN). In each of these environments, we produce actionable results for cyber security specialists including (but not limited to) (1) finding malicious network traffic involving internal and external attackers using port scans, SSH brute forcing, and NTP amplification attacks, (2) uncovering obfuscated network threats such as data exfiltration using DNS port and using ICMP traffic, and (3) finding network misconfiguration and performance degradation patterns.Google Scholar • Article
Muthu Baskaran, Tom Henretty, Benoit Pradelle, M. Harper Langston, David Bruns-Smith, James Ezick, Richard LethinJune 27, 2017
Publication Source: 2017 IEEE High Performance Extreme Computing Conference (HPEC '17), Waltham, MA, USA. [Best Paper Award Winner]
Google Scholar • Article
Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems.
In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 11x reduction in memory usage and up to 7x improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.