Identification of metabotypes in complex biological data using tensor decomposition

Abstract

Motivation

Differences in the physiological response to treatment, such as dietary intervention, has led to the development of precision approaches in nutrition and medicine to tailor treatment for improved benefits to the individual. One such approach is to identify metabotypes, i.e., groups of individuals with similar metabolic profiles and/or regulation. Metabotyping has previously been performed using e.g., principal component analysis (PCA) on matrix data. However, metabotyping methods suitable for more complex experimental designs such as repeated measures or cross-over studies are needed.

Results

We have developed a metabotyping method for tensor data, based on CANDECOMP/PARAFAC (CP) tensor decomposition. Metabotypes are inferred from CP scores using k-means clustering, and robustness is evaluated using bootstrapping of metabolites. As a proof-of-concept, we identified metabotypes from metabolomics data where 79 metabolites were analyzed in 8 time points postprandially in 17 overweight men that underwent a three-arm dietary crossover intervention. Two metabotypes were found, characterized by differences in amino acid metabolite concentration, that were differentially associated with baseline plasma creatinine (p = 0.007) and with the baseline metabolome (p = 0.004). These results suggest that CP decomposition provides a viable approach for metabotype identification directly from complex, high-dimensional data with improved biological interpretation compared to the more simplistic PCA approach. A simulation study together with results from measured data concluded that several preprocessing methods should be taken into consideration for CP-based metabotyping on complex tensor data.