An All–at–Once CP Decomposition Method for Count Tensors

CANDECOMP/PARAFAC (CP) tensor decomposition is a popular method for detecting latent behaviors in real– world data sets. As data sets grow larger and more elaborate, more sophisticated CP decomposition algorithms are required to enable these discoveries. Data sets from many applications can be represented as count tensors. To decompose count tensors, one should minimize the sum of the generalized Kullback– Leibler divergences from each tensor entry to each corresponding decomposition entry. Most often, this is done using the algorithm CP–APR (CP–Alternating Poisson Regression). In view of the fact that all–at–once optimization algorithms for related CP decomposition problems often achieve better decomposition accuracy than alternating algorithms like CP–APR, we develop CP–POPT– GDGN, an all–at–once algorithm for count tensor decomposition that utilizes a generalized damped Gauss-Newton method. We then implement a highly efficient version of CP–POPT–GDGN into the tensor package ENSIGN. After decomposing several tensors formed from real–world data sets, many of which are related to network traffic, we see that CP–POPT–GDGN typically outperforms CP–APR, both in terms of decomposition accuracy and latent behavior detection.

For information on Reservoir’s technology related to this paper, visit ENSIGN.