Longitudinal data analysis using matrix completion


In clinical practice and biomedical research, measurements are often collected sparsely and irregularly in time while the data acquisition is expensive and inconvenient. Examples include measurements of spine bone mineral density, cancer growth through mammography or biopsy, a progression of defect of vision, or assessment of gait in patients with neurological disorders. Since the data collection is often costly and inconvenient, estimation of progression from sparse observations is of great interest for practitioners.From the statistical standpoint, such data is often analyzed in the context of a mixed-effect model where time is treated as both random and fixed effect. Alternatively, researchers analyze Gaussian processes or functional data where observations are assumed to be drawn from a certain distribution of processes. These models are flexible but rely on probabilistic assumptions and require very careful implementation. In this study, we propose an alternative elementary framework for analyzing longitudinal data, relying on matrix completion. Our method yields point estimates of progression curves by iterative application of the SVD. Our framework covers multivariate longitudinal data, regression and can be easily extended to other settings. We apply our methods to understand trends of progression of motor impairment in children with Cerebral Palsy. Our model approximates individual progression curves and explains 30% of the variability. Low-rank representation of progression trends enables discovering that subtypes of Cerebral Palsy exhibit different progression trends.

Under review