Latent Variable Models for Indirectly or Imprecisely Measured Networks
In the social sciences, social networks are important structures which represent the relationships and interactions between actors in a population of study. The most common methods for measuring networks are to survey study participants about who their connections are and to collect interaction activity between pairs of actors. However, directly measuring the exact network of interest can be challenging. Depending on the context, edges in the network may represent past behavior, social and geographic distance, or the potential for contact in the future given certain sets of circumstances. In the context of surveys, participants do not always provide accurate accounts of their connections, which can result in mismeasurement of the network. In context of logged activity data, interactions do not directly quantify relationships between individuals or the propensity to interact in the future. We broadly conceptualize the observed data as manifestations from a latent network of interest, and we seek to use the former to infer the structure of the latter. In this talk, we apply this conceptual framework to two common problems: recovering the dynamic relational network of interest from temporal interaction data and mitigating the influence of mismeasurement on inference.
First, we develop a real-time anomaly detection algorithm for directed activity on large, sparse networks. The expected propensity for future interactions is conceptualized using a dynamic logistic model with interaction terms for sender- and receiver-specific latent factors in addition to sender- and receiver-specific popularity scores; deviations from this underlying model constitute potential anomalies. Latent nodal attributes are estimated via a variational Bayesian approach and may change over time, representing natural shifts in network activity. Estimation is augmented with a case-control approximation to take advantage of the sparsity of the network and reduces computational complexity from $O(N^2)$ to $O(E)$, where $N$ is the number of nodes and $E$ is the number of edges. We run our algorithm on network event records collected from an enterprise network of over 25,000 computers in order to identify potential cybersecurity attacks. In the second part of this talk, we propose a point-process model for inferring a network of social relations from interaction data that preserves the data’s continuous-time nature in the inferred network. We model interactions between actors with inhomogeneous Poisson processes whose intensities are dependent on time, covariates, and the dynamic latent network. Interactions can be spurious and not inherently indicative of an underlying connection; rather, these latent connections are characterized by consistent deviations from expected, baseline behavior. We explore networks inferred by our method in the contexts of college students and barn swallows.
Lastly, we consider an issue that may arise when using networks measured through surveys, specifically the impact of missing links in the context of experiments of networks. In these experiments, individuals are not only influenced by their own treatment assignments, but also by those of their peers. These treatment spillovers are often of direct scientific interest. Through simulations, we show that missing links can induce bias for these estimates of spillover. We develop a mixture model that provides consistent estimators in this setting and use this model to study weather insurance adoption among farmers in rural China.