*The SIAM International Conference for Data Mining (SDM18) will be held May 3-5, 2018 in San Diego, CA. Click here for more information.
The tutorial will be held on Friday, May 4th, in two parts:
1:15-3:15pm and 3:30-5:10pm.
*

**Slides and Labs:** The PDF slides (8 MB) are now available here.
All the labs can be downloaded from Google Drive.
We have split out some of the data files for convenience. If you download the gas files separately, place them in the labs/data folder.
So, you can download **one** of the following:

- labs_without_gas_data.zip (6 MB) + gas3.mat (75 MB) + gas10.mat (231 MB)
- labs_without_gas10.zip (80 MB) + gas10.mat (231 MB)
- labs_with_gas_data.zip (306 MB)

**IMPORTANT:** This tutorial features interactive lab exercises but we need you to sign up with the organizers in advance so that we can send you additional information on software installation and data download in advance of the workshop. You are expected to

- bring your own laptop for the exercises,
- have MATLAB pre-installed (free license available),
- have the Tensor Toolbox installed, and
- have the datasets downloaded.

Free licenses are available for MATLAB, but you must sign up with the organizers in advance (no later than May 1st if you need MATLAB).

**Description:** Multi-dimensional or multi-way datasets are
becoming increasingly common in science and engineering applications.
Data structures that live in three or more dimensions often exhibit
informative hidden structures that can be discovered and understood
through tensor decompositions. The purpose of this tutorial is to
dive deep into the canonical polyadic tensor decomposition (also known
as CANDECOMP, PARAFAC, or just CP), giving attendees the mathematical
and algorithmic tools to understand existing methods and have a strong
foundation for developing their own tools. The tutorial begins with
the basics and builds up to very recent developments. It is
appropriate for anyone at the graduate school level or higher with a
basic understanding of numerical methods. A unique feature of our
proposed tutorial will be hands-on exercises using the Tensor Toolbox
for MATLAB to apply tensor decompositions to real-world open source
datasets. Through these exercises, we hope to give attendees a
glimpse into the application of these methods and the open problems
that still exist (like choosing the rank of the tensor decomposition).
We expect that most attendees will already have access to MATLAB
through their universities, but we also intend to work with Mathworks
to get temporary licenses for participants. We will work with one
dataset that is nearly 2 GB, so we will invite participants to
download the datasets ahead of time.

**Instructors:**

- Tamara G. Kolda, Sandia National Labs, Livermore, CA, tgkolda@sandia.gov
- Daniel M. Dunlavy, Sandia National Labs, Albuquerque, NM, dmdunla@sandia.gov

**Length:** The course will be two two-hour segments, for a
total of four hours. The first two hours are focused on mathematical
background that generally only persons already working in tensor
decompositions know. This lays the groundwork for the second two hours
which is focused on more advanced situations, such as missing data,
alternative decompositions, larger data sets, and advanced
algorithms. Attendees will benefit from a detailed review of the
mathematical background that is never presented in ordinary talks.

**What background will be required of the audience?** Students
are expected to have a very basic familiarity with numerical
algorithms. Experience with numerical linear algebra and optimization
is helpful but not required; all definitions will be presented during
the tutorial. This is generally intended for a broad audience of
scientists and engineers without prior experience in tensors
analysis.

**Why is this topic important/interesting to the SIAM data mining
community?** Tensor decompositions are ubiquitous in data mining,
but there are few books available on the topic.

**What is the benefit to participants?**
Tutorial attendees should expect to get the following: (1) Experience in applying the CP
tensor decomposition to interesting data sets, (2) Understanding of the mathematical
formulation of the CP and algorithms to compute it, and (3) Ideas for open problems to solve.

The tutorial will be divided into two two-hour parts, but the exact division of the content will be adapted on demand to the participants. Here is an outline of the tutorial topics:

- Introducing the CP Decomposition & Background
- Computing the CP Decomposition
- Details on the CP Decomposition
- Lab 1
- Missing Data
- Generalized CP Decomposition (*)
- Poisson Tensor Factorization (*)
- Randomized Least Squares for the CP Decomposition (*)
- Summary and Discussion
- Lab 2

Items marked with an asterisk will be material that comes primarily from our own research, though it will also involve introduction to general related concepts (statistical likelihood calculations, matrix sketching, etc.). The labs will involve real-world data sets from chemometrics, gas monitoring systems, and so on. We do include a very brief MATLAB primer for students that are unfamiliar with it.

**Tamara G. Kolda** is a Distinguished Member of Technical Staff at Sandia National Laboratories. Her research interests include multilinear algebra and tensor decompositions, data mining, network/graph algorithms and analysis, numerical optimization, parallel computing, and the design of scientific software. Dr. Kolda is a SIAM Fellow, an ACM distinguished member, and a recipient of several awards including three best paper prizes and a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE). Dr. Kolda wrote one of the key
review papers on *Tensor Decompositions and Applications* (SIAM Review, 2009) which has been cited over 3600 times. She frequently gives keynote and invited talks on tensor decompositions, including an invited talk at MLConf in San Francisco and the SIAM Invited Address at the 2018 Joint Mathematics Meeting in San Diego. See www.kolda.net for more.

**Daniel M. Dunlavy** is a Principal Member of Technical Staff in the Center for Computing Research at Sandia National Laboratories in Albuquerque, NM. His research interests include tensor decompositions, numerical optimization, numerical linear algebra, machine learning, data mining, text analysis, parallel computing, and cyber security.

*Many thanks to Jed Duerch and Kina Kincher-Winoto for major contributions to this tutorial.*