Body

Long-range dependence in time series data is a widely observed phenomenon with significant practical and theoretical consequences. In statistics and econometrics, this property is often formalized through the notion of fractional integration. We present two contributions extending this classical view to current topics in the modeling of multivariate and long-range dependent data.

First, we introduce a statistical criterion for long memory in the learned representations of recurrent deep neural networks. In contrast to the standard heuristic tools to evaluate “memory” in such models, this approach targets a mathematically well-defined feature of the data-generating process and offers statistically principled investigation via hypothesis testing. Experiments reveal the presence of long memory across a broad selection of language and music data, but suggest that benchmark recurrent models for natural language may fail to capture this property.

Second, we propose a frequency-domain model for multivariate time series that flexibly captures both long-range dependent and high-frequency behavior in a stationary sequence. We establish the equivalence of this model class to a broad class of long-range dependent stochastic processes, define a penalized estimation procedure motivated by a natural decomposition of the multivariate fractional spectrum, and discuss resampling approaches for inference based on recent theoretical developments for the multivariate block bootstrap. The method is illustrated on simulated and financial data.