Scalable Tensor Factorizations with Missing Data  Evrim Acar, Danny M. Dunlavy, Tamara G. Kolda, Morten Mørup
 Abstract  The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multiway arrays (i.e., tensors) in the presence of missing data. We focus on one of the most wellknown tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CPWOPT (CP Weighted OPTimization) using a firstorder optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the realworld usefulness of CPWOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.  Type  Conference paper [With referee]  Conference  Proceedings of the 2010 SIAM International Conference on Data Mining  Year  2010 pp. 701712  BibTeX data  [bibtex]  IMM Group(s)  Intelligent Signal Processing 
Back :: IMM Publications
