Nonparametric Identified Methods to Handle Nonignorable Missing Data
Update 4/25/2019: Location of this seminar has been moved to SMI 211.
Update 4/25/2019: Location of this seminar has been moved to SMI 211.
Bayesian hierarchical modeling is a powerful tool for demography and climate science. In this talk we will focus on its use for accounting for uncertainty about past demographic quantities in population projections. Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy.
A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation (MLE) may be implemented using numerical integration, the approach becomes computationally intensive. In contrast, the score matching method of Hyvärinen (2005) avoids direct calculation of the normalizing constant and yields closedform estimates for exponential families of continuous distributions on the mdimensional Euclidean space R^m.
Green Dot is a movement, a program, and an action. The aim of Green Dot is to prevent and reduce sexual assault & relationship violence at UW by engaging students as leaders and active bystanders who step in, speak up, and interrupt potential acts of violence. The Green Dot movement is about gaining a critical mass of students, staff and faculty who are willing to do their small part to actively and visibly reduce powerbased personal violence at UW.
Hawkes processes has been a popular point process model for capturing mutual excitation of discrete events. In the network setting, this can capture the mutual influence between nodes, which has a wide range of applications in neural science, social networks, and crime data analysis. In this talk, I will present a statistical changepoint detection framework to detect in realtime, a change in the influence using streaming discrete events.
The celebrated Grenander (1956) estimator is the maximum likelihood estimator of a decreasing density function. In contrast to alternative nonparametric density estimators, Grenander estimator does not require any smoothing parameters and is often viewed as a fully automatic procedure. However, the monotonic density assumption might be questionable. While testing qualitative constraints such as monotonicity are difficult in general, we show that a likelihood ratio test statistic Kₙ has an incredibly simple asymptotic null distribution: n¹
Randomization is a basis for inferring treatment effects with minimal additional assumptions. Appropriately using covariates in randomized experiments will further yield more precise estimators. In his seminal work Design of Experiments, R. A. Fisher suggested blocking on discrete covariates in the design stage and conducting the analysis of covariance (ANCOVA) in the analysis stage. In fact, blocking can be embedded into a wider class of experimental design called rerandomization, and the classical ANCOVA can be extended to more general regressionadjusted estimators.
At Amazon’s Inventory Planning and Control Laboratory (IPC Lab) we run randomized controlled trials (RCTs) that evaluate the efficacy of inproduction buying and supply chain policies on important business metrics. Our customers are leading supply chain researchers and business managers within Amazon, and our mission is to help them best answer the question, ‘Should I roll out my policy?’ In this talk we discuss how we navigate multiple obstacles to fulfilling our mission.
Deep neural nets have become in recent years a widespread practical technology, with impressive performance in computer vision, speech recognition, natural language processing and many other applications. Deploying deep nets in mobile phones, robots, sensors and IoT devices is of great interest. However, stateoftheart deep nets for tasks such as object recognition are too large to be deployed in these devices because of the computational limits they impose in CPU speed, memory, bandwidth, battery life or energy consumption.
Causal inference is a challenging problem because causation cannot be established from the observational data alone. Researchers typically rely on additional sources of information to infer causation from association. Such information may come from powerful designs such as randomization, or background knowledge such as information on all confounders. However, perfect designs or background knowledge required for establishing causality may not always be available in practice.
The identification of new rare signals in data, the detection of a sudden change in a trend, and the selection of competing models, are some among the most challenging problems in statistical practice. Building
Room
211
Manifold Data Analysis with Applications to HighResolution 3D ImagingStart Time
Speaker
Matthew Reimherr
Many scientific areas are faced with the challenge of extracting information from large, complex, and highly structured data sets. A great deal of modern statistical work focuses on developing tools for handling such data. In this work we presents a new subfield of functional data analysis, FDA, which we call Manifold Data Analysis, or MDA. MDA is concerned with the statistical analysis of samples where one or more variables measured on each unit is a manifold, thus resulting in as many manifolds as we have units. Building
Room
211
Querying Probabilistic DataStart Time
Speaker
Dan Suciu
A major challenge in data management is how to manage uncertain data. Many reasons for the uncertainty exists: the data may be extracted automatically from text, it may be derived from the physical world such as RFID data, it may be integrated using fuzzy matches, or may be the result of complex stochastic models. Whatever the reason for the uncertainty, a data management system needs to offer predictable performance to queries over large instances of uncertain data. Building
Room
304
Gini Association and the PseudoLorenz CurveStart Time
Speaker
Somesh Das Gupta
We were motivated by the problem of assessing the influence on the inequality in income by the corresponding inequality in some other related variable (say, the number of years of formal education completed). More generally, consider the pseudoLorenz curve of a nonnegative r.v. Y relative to (i.e., with respect to the ordering of) another related nonnegative r.v. X. It is shown that this pseudoLorenz curve L(Y/X) always lies above the Lorenz curve L(Y) of Y. Building
Room
205
Nonparametric Estimation and Comparison for NetworksStart Time
Speaker
Cosma Shalizi
Scientific questions about networks are often comparative: we want to know whether the difference between two networks is just noise, and, if not, how their structures differ. I'll describe a general framework for network comparison, based on testing whether the distance between models estimated from separate networks exceeds what we'd expect based on a pooled estimate. Building
Room
211
Point Process Models for Astronomy: Quasars, Coronal Mass Ejections, and Solar FlaresStart Time
I will be presenting a talk on my dissertation research which consisted of the statistical analysis of two interesting astronomical applications involving point process data. Building
Room
304
Local Discriminant Bases and Their AapplicationsStart Time
Speaker
Naoki Saito
For signal and image classification problems, such as the ones in medical or geophysical diagnostics and military applications, extracting relevant features is one of the most important tasks. As an attempt to automate the feature extraction procedure and to understand what the critical features for classification are, we developed the socalled local discriminant basis (LDB) method which rapidly selects an orthonormal basis suitable for signal/image classification problems from a large collection of orthonormal bases (e.g., wavelet packets and local trigonometric bases). Building
Room
102
Impacts of Climate Change on Species Distributions: Empirical and Statistical ChallengesStart Time
Speaker
Janneke Hille Ris Lambers
One of the greatest challenges ecologists face is predicting how climate change will affect the organisms with which we share our planet. Ecological theory predicts that species current distributions are determined by their climatic niches (i.e. fitness as a function of climate). Statistical models relating species geographic distributions to climate (SDM’s – species distribution models) are therefore used to predict shifts in species distributions with climate change. Building
Room
211
De Finetti's Ultimate FailureStart Time
Speaker
Krzysztof Burdzy
The most scientific and least controversial claim of de Finetti's subjective philosophy of probability is that the rules of Bayesian inference can be derived from a system of axioms for rational decision making that does not presuppose existence of probability. In fact, de Finetti's argument is fatally flawed. The error is irreparable. The slides in PowerPoint and PDF are available at http://www.math.washington.edu/~burdzy/Philosophy/. Building
Room
304
Regular Variation and Extremes in Atmospheric ScienceStart Time
Speaker
Daniel S Cooley
Dependence in the tail of the distribution can differ from that in the bulk of the distribution. A basic tenet of a univariate extreme value analysis is to discard the bulk of the data and only analyze the data considered to be extreme. This is true for multivariate problems as well. We will first introduce a framework for describing tail dependence. The probabilistic framework of regular variation has strong ties to classical extreme value theory and provides a framework for describing tail dependence. Building
Room
211
Statistical Factor Models and Predictive Approaches for Problems of Molecular CharacterisationStart Time
Speaker
Mike West
I will discuss aspects of data analysis and modelling arising from a number of clinical studies that aim to integrate gene expression, and other forms of molecular data, into predictive modelling of clinical outcomes and disease states. Some of our work on empirical and model based approaches to defining underlying factor structure in largescale expression data, and the use of estimated factors in predictive regression and classification tree models, will be reviewed. Building
Room
102
Nonstationary Time Series Modeling and Estimation with Applications in OceanographyStart Time
Speaker
Adam Sykulski
This talk will focus on nonstationary time series, from both a methodological and applied perspective. On the methodology side, I will discuss new stochastic models for capturing structure in bivariate data, by representing the series as complexvalued. This representation allows for novel ways of capturing features that are multiscale, anisotropic and/or nonstationary. I will also present new methodology and theory for maximum likelihood inference in the frequencydomain, specifically by providing a method for removing estimation error from the Whittle likelihood. Building
Room
211
UPS Delivers Optimal Phase Diagram for High Dimensional VariableStart Time
Speaker
Jiashun Jin
Consider a linear regression model Y = XÎ² + z; z ~ N(0, In); X = Xn,p; where both p and n are large but p > n. The vector Î² is unknown but is sparse in the sense that only a small proportion of its coordinates is nonzero, and we are interested in identifying these nonzero ones. We model the coordinates of Î² as samples from a twocomponent mixture (1Ïµ)Ï…0 + ÏµÏ€, and the rows of X as samples from N(0, 1/n Î©), where Ï…0 is the point mass at 0, Ï€ is a distribution, and Î© is a p by p correlation matrix which is unknown but is presumably sparse. Building
Room
304
Nonhomogeneous Hidden Markov Models for Downscaling Synoptic Atmospheric Patterns to Precipataion AmountsStart Time
Speaker
Enrica Bellone
Advisors: Peter Guttorp & Jim Hughes Building
Room
407
Using Radical Environmentalist Texts to Uncover Network Structure and Network FeaturesStart Time
Speaker
Zack Almquist
In their efforts to call attention to environmental problems, communicate with likeminded groups, and mobilize support for their activities, radical environmentalist organizations produce an enormous amount of text. These texts, like radical environmental groups themselves, are often (i) densely connected and (ii) highly variable in advocated protest activities. Given a corpus of radical environmentalist texts, can one uncover the underlying network structure of environmental (and related leftist) groups? Building
Room
211
Statistics at GoogleStart Time
Speaker
Jake D. Brutlag
Mike Meyer
This presentation will describe some of the problems faced and methods used by statisticians at Google: â€¢ A primary dimension of search quality is the relevance of search results to the search query. Preference rank allows us to convert pairwise comparisons into a ranking of search results. â€¢ Through the AdSense program, Google delivers targeted advertising on thirdparty web sites, which we refer to as publishers. Publisher scores are a method of ranking publishers by their effectiveness as an ad delivery platform. Building
Room
304
MS Thesis Presentation  Simple Transformation Techniques for Improved Nonparametric RegressionStart Time
Speaker
George E. Barnidge
In this paper, the authors propose and investigate two new methods for achieving less bias in nonparametric regression and use simulations to compare the bias, variance, and mean squared error from the second and preferred of these two methods to the biases, variances, and mean squared errors of the local constant, local linear, and local cubic nonparametric regression estimators. The two new methods proposed by the authors have bias of order h^4 where h is the estimatorâ€™s smoothing parameter, in contrast to the basic kernel estimatorâ€™s bias of order h^2. Building
Room
305
From Big Data to Precision Oncology using Machine LearningStart Time
Speaker
SuIn Lee
While targeting key drivers of tumor progression (e.g., BCR/ABL, HER2, and BRAFV600E) has had a major impact in oncology, most patients with advanced cancer continue to receive drugs that do not work in concert with their specific biology. This is exemplified by acute myeloid leukemia (AML), a disease for which treatments and cure rates (in the range of 20%) have remained stagnant. Effectively deploying an everexpanding array of cancer therapeutics holds great promise for improving these rates but requires methods to identify how drugs will affect specific patients. Building
Room
211
Identification of Minimal Sets of Covariates for Matching EstimatorsStart Time
Speaker
Xavier de Luna
The availability of large observational data bases allow empirical scientists to consider estimating treatment effects without conducting costly and/or unethical experiments where the treatment would be randomized. The NeymanRubin model (potential outcome framework) and the associated matching estimators have become increasingly popular, because they allow for the nonparametric estimation of average treatment effects. Building
Room
304
Bayesian Models for Integrative GenomicsStart Time
Speaker
Marina Vannucci
Novel methodological questions are being generated in the biological sciences, requiring the integration of different concepts, methods, tools and data types. Bayesian methods that employ variable selection have been particularly successful for genomic applications, as they allow to handle situations where the amount of measured variables can be much greater than the number of observations. In this talk I will focus on models that integrate experimental data from different platforms together with prior knowledge. Building
Room
211
Markov Random Fields and Issues of ComputationStart Time
Speaker
Brian Lucena
Markov Random Fields are extremely useful and generally applicable for probabilistic modelling of a wide range of systems. We\'ll review methods for performing inference calculations (most likely configuration and marginal probabilities) on MRFs. Unfortunately, for many tasks, these basic calculations are computationally infeasible. We\'ll discuss the limitations of standard computation methods and the graphtheoretic properties related to computational complexity. Building
Room
102
Prior Adjusted Default Bayes Factors for Testing (In)Equality Constrained HypothesesStart Time
Speaker
Joris Mulder
Bayes factors have been proven to be very useful when testing statistical hypotheses with inequality (or order) constraints and/or equality constraints between the parameters of interest. Two useful properties of the Bayes factor are its intuitive interpretation as the relative evidence in the data between two hypotheses and the fact that it can straightforwardly be used for testing multiple hypotheses. The choice of the prior, which reflects one's knowledge about the unknown parameters before observing the data, has a substantial effect on the Bayes factor. Building
Room
211
Probabilistic Weather Forecasting Using Bayesian Model AveragingStart Time
Speaker
J. McLean Sloughter
Probabilistic forecasts of wind vectors are becoming critical as interest grows in wind as a clean and renewable source of energy, in addition to a wide range of other uses, from aviation to recreational boating. Unlike other common forecasting problems, which deal with univariate quantities, statistical approaches to wind vector forecasting must be based on bivariate distributions. The prevailing paradigm in weather forecasting is to issue deterministic forecasts based on numerical weather prediction models. Building
Room
105
Ergodic Limit Laws for Stochastic Optimization ProblemsStart Time
Speaker
Lisa Korf
Department of Mathematics Optimization Seminar Solution procedures for stochastic programming problems, statistical estimation problems (constrained or not), stochastic optimal control problems and other stochastic optimization problems often rely on sampling. The justification for such an approach passes through 'consistency.' A comprehensive, satisfying and powerful technique is to obtain the consistency of the optimal solutions, statistical estimators, controls, etc., as a consequence of the consistency of the stochastic optimization problems themselves. Building
Room
205
Latentvariable graphical modeling via convex optimizationStart Time
Speaker
Venkat Chandrasekaran
Suppose we have a graphical model with sample observations of only a subset of the variables. Can we separate the extra correlations induced due to marginalization over the unobserved, hidden variables from the structure among the observed variables? In other words is it still possible to consistently perform model selection despite the unobserved, latent variables? Building
Room
211
Nonparametric Estimation of a Convex BathtubShaped Hazard FunctionStart Time
Speaker
Hanna Jankowski
In the analysis of lifetime data, a key object of interest is the hazard function, or instantaneous failure rate. One natural assumption is that the hazard is bathtub, or Ushaped (i.e. first decreasing, then increasing). In particular, this is often the case in reliability engineering or human mortality. Building
Room
304
MS Thesis Presentation  Hierarchical Mixture of Experts and ApplicationsStart Time
Speaker
Donatello Telesca
HME (Hierarchical Mixture of Experts) is a tree structured architecture for supervised learning. It is characterized by Soft multiway probabilistic splits, generally based on linear functions of input values, and by linear or logistic fit of the terminal nodes (in HME literature called Experts) rather then constant function as in CART. The statistical model underlying HME is a hierarchical mixture model, which allows for maximum likelihood estimation of the parameters using EM methods. Building
Room
305
Estimation of a Twocomponent Mixture Model with Applications to Multiple TestingStart Time
Speaker
Bodhi Sen
We consider estimation and inference in a two component mixture model where the distribution of one component is completely unknown. We develop methods for estimating the mixing proportion and the unknown distribution nonparametrically, given i.i.d. data from the mixture model. We use ideas from shape restricted function estimation and develop "tuning parameter free" estimators that are easily implementable and have good finite sample performance. We establish the consistency of our procedures. Building
Room
211
Bootstrap and Subsampling for NonStationary Spatial DataStart Time
Speaker
Sara Sjoestedtde Luna
Subsampling and bootstrap methods have been suggested in the literature to nonparametrically estimate the variance and distribution of statistics computed from spatial data. Usually stationary data are required to ensure that the methods work. However, in empirical applications the assumption of stationarity often must be rejected. This talk presents consistent bootstrap and subsampling methods to estimate the variance and distributions of statistics based on nonstationary spatial lattice data. Applications to forestry are also discussed. Building
Room
304
Controlling False Discovery Rate Via KnockoffsStart Time
Speaker
Rina Foygel Barber
In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are associated with the response, while controlling the false discovery rate (FDR) to ensure that our results are reliable and replicable. The knockoff filter is a variable selection procedure for linear regression, proven to control FDR exactly under any type of correlation structure in the regime where n>p (sample size > number of variables). Building
Room
211
Point Process Transformations and Applications to Wildfire DataStart Time
Speaker
Frederic (Rick) Paik Schoenberg
This talk will review some ways of transforming point processes, including smoothing, thinning, superposition, rescaling, and tessellation. Ways in which each of these may be used in the analysis of point process data will be examined, especially in relation to the problem of estimating wildfire hazard. We will explore in particular an important computational geometry problem involving tessellations, namely the estimation of point locations from piecewise constant image data via Dirichlet tessellation inversion. Building
Room
205
Flexible, Reliable, and Scalable Nonparametric LearningStart Time
Speaker
Erik Sudderth
Applications of statistical machine learning increasingly involve datasets with rich hierarchical, temporal, spatial, or relational structure. Bayesian nonparametric models offer the promise of effective learning from big datasets, but standard inference algorithms often fail in subtle and hardtodiagnose ways. We explore this issue via variants of a popular and general model family, the hierarchical Dirichlet process. Building
Room
211
Estimation of the Relative Risk and Risk DifferenceStart Time
Speaker
Thomas S. Richardson
I will first review wellknown differences between odds ratios, relative risks and risk differences. These results motivate the development of methods, analogous to logistic regression, for estimating the latter two quantities. I will then describe simple parametrizations that facilitate maximumlikelihood estimation of the relative risk and riskdifference. Further, these parametrizations allow for doublyrobust gestimation of both quantities. (Joint work with James Robins, Harvard School of Public Health) Building
Room
304
Curve Fitting and Neuron Firing PatternsStart Time
Speaker
Robert E. Kass
Reversiblejump Markov chain Monte Carlo may be used to fit scatterplot data with cubic splines having unknown numbers of knots and knot locations. Key features of the implementation my colleagues and I have investigated are (i) a fully Bayesian formulation that puts priors on the spline coefficients and (ii) MetropolisHastings proposal densities that attempt to place knots close to one another. Simulation results indicate this methodology can produce fitted curves with substantially smaller mean squarederror than competing methods. Building
Room
205
Modeling hierarchical variance with Kronecker structure, with application to quality measures in Medicare AdvantageStart Time
Speaker
Laura Hatfield
Studying covariance matrices in hierarchical models can reveal meaningful relationships among variables, but these become difficult to interpret as the number of variables grows. Conventional factor analysis reduces the dimension by mapping onto a set of onedimensional factors, but does not accommodate variables with a crossclassified layout. For such applications, we develop hierarchical models with Kroneckerproduct (separable) covariance structure at the second level. Building
Room
211
Survey of Generalized Inverses and Their Use in Stochastic ModellingStart Time
Speaker
Jeff Hunter
In many stochastic models, in particular Markov chains in discrete or continuous time and Markov renewal processes, a Markov chain is present either directly or indirectly through some form of embedding. The analysis of many problems of interest associated with these models, eg. stationary distributions, moments of first passage time distributions and moments of occupation time random variables, often concerns the solution of a system of linear equations involving I  P, where P is the transition matrix of a finite, irreducible, discrete time Markov chain. Building
Room
304
MS Thesis Presentation  A NonParametric Approach for Handling Repeated Measures in Cancer ExperimentsStart Time
Speaker
Justin T. Okano
In longitudinal studies, the usual modeling assumptions for multivariate analyses don\'t always hold up so well. One way to treat this is to use nonparametric approaches. In the paper I will be presenting on, the authors analyzed tumor volume in rats as a function of lipids in their diet. The data was highly heteroscedastic and strongly correlated with time. To compare lipid diets, randomization Ftests were used. Then, local polynomial smoothing was used to create tumor growth curves for each diet, as well as confidence intervals that account for the serially correlated data. Building
Room
305
Novel Approaches to Snowball / RespondentDriven Sampling That Circumvent the Critical ThresholdStart Time
Speaker
Karl Rohe
Web crawling, snowball sampling, and respondentdriven sampling (RDS) are three types of network driven sampling techniques that are popular when it is difficult to contact individuals in the population of interest. This talk will first review previous research which has shown that if participants refer too many other participants, then under the standard Markov model in the RDS literature, the standard approaches do not provide "square root n" consistent estimators. In fact, there is a critical threshold where the design effect of network sampling grows with the sample size. Building
Room
211
NonStationary Analysis and Radial Localisation in 2DStart Time
Speaker
Sofia Olhede
Image analysis has in the last decade experienced a revolution via the development of new tools for the representation and analysis of local image features. At the heart of these developments is the construction of suitable local representations of structure, via decompositions in a set of localized functions. The chosen decomposition then forms the setting for further analysis and/or estimation methods. In particular, compression of a given representation ensures that most decomposition coefficients are of negligible magnitude, and this often simplifies the analysis considerably. Building
Room
304
Clustering Based on NonParametric Density Estimation: A ProposalStart Time
Speaker
Adelchi Azzalini
Cluster analysis based on nonparametric density estimation represents an approach to the clustering problem whose roots date back several decades, but it is only in recent times that this approach could actually be developed. The talk presents one proposal within this approach which is among the few ones which have been brought up to the operational stage. Building
Room
211
Overdetermined Estimating Equations with Applications to Panel DataStart Time
Speaker
Dylan Small
Panel data has important advantages over purely crosssectional or timeseries data in studying many economic problems, because it contains information about both the intertemporal dynamics and the individuality of the entities being investigated. A commonly used class of models for panel studies identifies the parameters of interest through an overdetermined system of estimating equations. Two important problems that arise in such models are the following: (1) It may not be clear a priori whether certain estimating equations are valid. Building
Room
211
Optimal Design of Experiments in the Presence of Network InterferenceStart Time
Speaker
Edo Airoldi
Causal inference research in statistics has been largely concerned with estimating the effect of treatment (e.g. personalized tutoring) on outcomes (e.g., test scores) under the assumption of "lack of interference"; that is, the assumption that the outcome of an individual does not depend on the treatment assigned to others. Moreover, whenever its relevance is acknowledged (e.g., study groups), interference is typically dealt with as an uninteresting source of variation in the data. Building
Room
211
Two Related Problems Involving Gaussian Markov Random FieldsStart Time
Speaker
Håvard Rue
Gaussian Markov Random Fields (GMRFs) has been around for a long time; however, it is first in the recent years that its computational benefits in Bayesian inference has become clear. In this talk, I\'ll discuss two related problems which involves GMRFs. The first is the problem of constructing Gaussian fields on triangulated manifolds. By viewing this as finding the solution of a stochastic partial differential equation (SPDE), the GMRFs appear as the solutions when solving the SPDE using the \"finite element\" approach. Building
Room
304
ComputationallyIntensive Inference in Molecular Population GeneticsStart Time
Speaker
Matthew Stephens
Modern molecular genetics generates extensive data which document the genetic variation in natural populations. Such data give rise to challenging statistical inference problems both for the underlying evolutionary parameters and for the demographic history of the population. These problems are of considerable practical importance and have attracted recent attention, with the development of algorithms based on importance sampling (IS) and Markov chain Monte Carlo (MCMC). Building
Room
205
Low rank tensor completionStart Time
Speaker
Ming Yuan
Many problems can be formulated as recovering a lowrank tensor. Although an increasingly common task, tensor recovery remains a challenging problem because of the delicacy associated with the decomposition of higher order tensors. We investigate several convex optimization approaches to low rank tensor completion. Building
Room
211
Assessment of Scaling in High Frequency Data: Convex Rearrangements in the Wavelet DomainStart Time
Speaker
Brani Vidakovic
We overview the notion of regular scaling in data and estimators of this regular scaling on several examples involving high frequency measurements. Next we discuss the importance of wavelet domains and ability of wavelets to precisely estimate regular Building
Room
102
Statistical Problems in Large NetworksStart Time
Speaker
Persi Diaconis
Natural modeling of large networks leads to exponential models with sufficient statistics being such things as the number of triangles or the degree sequence. These look like standard problems but some surprises have emerged. For some models, it is possible to estimate n parameters based on a sample of size one. For other models, with two parameters, maximum likelihood is inconsistent. Many of these models show phase transitions. The new tools required include the emerging theory of graph limits. This is joint work with Sourav Chatterjee and Allan Sly Building
Room
120
A SMART Stochastic Algorithm for Nonconvex OptimizationStart Time
Speaker
Aleksandr Y. Aravkin
We show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set and (2) simultaneously fits the trimmed model on the remaining uncontaminated data. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximalgradient algorithm that incorporates prior knowledge through nonsmooth regularization. Building
Room
211
Random Effects Graphical Regression Models for Biological Monitoring DataStart Time
Speaker
Devin S. Johnson
An emerging area of research in ecology is the analysis of functional species assemblages. In essence, the analysis of functional assemblages is concerned with determining and predicting the composition of individuals categorized using different life history traits instead of strict taxa names. We propose a statespace model for the analysis of multiple trait compositions along with sitespecific covariate information. A sitespecific random effects term allows for modeling extra variability including spatial variability in trait compositions. Building
Room
102
Computational Considerations on NeuroengineeringStart Time
Speaker
Bing Brunton
Neuroengineering is an emerging interdisciplinary field with the goal of developing effective, robust devices that interact with the nervous system. These devices may act in closed loop with the nervous system to augment, repair, or even replace aspects of its basic function. Neuroengineering presents a set of interesting computational challenges that may require diverse solutions. For instance, How do we perform efficient computations on large quantities of neural data with severely limited computing resources? Building
Room
211
The Covariance Structure of Circular RanksStart Time
Speaker
Marlos Viana
The linear representation of order statistics is a random permutation matrix which can be applied to obtain the usual covariance structure of ranks and other induced order statistics. In this talk, the algebraic structure of the standard case will be identified and extended to the ordering of observations indexed by circular, uniformly spaced, coordinates. These data are characteristic, for example, of corneal curvature maps used to assess regular astigmatism in the optics of the human eye. Building
Room
205
Causal Discovery with Confidence Using Invariance PrinciplesStart Time
Speaker
Nicolai Meinshausen
What is interesting about causal inference? One of the most compelling aspects is that any prediction under a causal model is valid in environments that are possibly very different to the environment used for inference. For example, variables can be actively changed and predictions will still be valid and useful. This invariance is very useful but still leaves open the difficult question of inference. We propose to turn this invariance principle around and exploit the invariance for inference. Building
Room
211
Estimating Common Functional Principal Components in a Linear Mixed Effects Model FrameworkStart Time
Speaker
Kevin Hayes
The emerging area of statistical science known as functional data analysis is concerned with evaluating information on curves or functions. In recent years much of the research emphasis has focused on extending statistical methods from classical settings into the functional domain. For example, functional principal component analysis (FPCA) is analogous to the traditional PCA, except that the observed data are entire functions rather than multivariate vectors. Building
Room
304
Constrained Nonparametric Estimation via Mixtures, with an Application in Cancer GeneticsStart Time
Speaker
Peter D Hoff
We discuss modeling probability measures constrained to a convex set. We represent measures in such sets as mixtures of simple, known extreme measures, and so the problem of estimating a constrained measure becomes one of estimating an unconstrained mixing measure. Such convex constraints arise in many modeling situations, such as empirical likelihood and modeling under stochastic ordering constraints. Building
Room
205
Using SingleCell Transcriptome Sequencing to Infer Olfactory Stem Cell Fate TrajectoriesStart Time
Speaker
Sandrine Dudoit
Singlecell transcriptome sequencing (scRNASeq), which combines highthroughput singlecell extraction and sequencing capabilities, enables the transcriptome of large numbers of individual cells to be assayed efficiently. Building
Room
211
Robust Inference Using Higher Order Influence FunctionStart Time
Speaker
Lingling Li
Suppose we obtain $n$ i.i.d copies of a random vector $O$ with unknown distribution $F(\\\\theta)$, $\\\\theta \\\\in Theta$. Our goal is to construct honest $100 (1  \\\\alpha)$% asymptotic confidence intervals (CI) (whose width shrinks to zero with increasing $n$ at the fastest possible rate), through higher order influence functions, for a functional $\\\\psi(\\\\theta)$ in a model that places no restrictions on $F$; other than, perhaps, bounds on both the $L_p$ norms and the roughness (more generally, the complexity) of certain density and conditional expectation functions. Building
Room
102
â€œInsuranceâ€ Against Incorrect Inference after Variable SelectionStart Time
Speaker
Larry Brown
Among statisticians variable selection is a common and very dangerous activity. This talk will survey the dangers and then propose two forms of insurance to guarantee against the damages from this activity. Building
Room
211
Graph Structured Signal ProcessingStart Time
Speaker
James Sharpnack
Signal processing on graphs is a framework for nonparametric function estimation and hypothesis testing that generalizes spatial signal processing to heterogeneous domains. I will discuss the history of this line of research, highlighting common themes and major advances. I will introduce various graph wavelet algorithms, and highlight any known approximation theoretic guarantees. Recently, it has been determined that the fused lasso is theoretically competitive with wavelet thresholding under some conditions, meaning that the fused lasso is also a locally adaptive smoothing procedure. Building
Room
211
Robust Covariance Matrix Estimation with Applications in FinanceStart Time
Speaker
R. Douglas Martin
This talk provides an introduction to robust estimation of covariance matrices, covering both theoretical and computational aspects, and indicating what we believe to be best choice of estimator at the present time. We begin with a brief introduction to the main concepts of robustness, focusing primarily on minimizing maximum bias for a class of standard multivariate mixture outlier generating models, while maintaining high efficiency at the nominal model. Building
Room
102
Statistical Modeling in Disease Screening and Progression: Case Studies in Prostate CancerStart Time
Speaker
Lurdes Y.T. Inoue
Many prognostic models for cancer use biomarkers that have utility in early detection. For example, in prostate cancer, models predicting diseasespecific survival use serum prostatespecific antigen (PSA) levels. These models are typically interpreted as indicating that detecting disease at a lower threshold of the biomarker is likely to generate a survival benefit. However, lowering the threshold of the biomarker is tantamount to early detection. It is not known whether the existing prognostic models imply a survival benefit under early detection once lead time has been accounted for. Building
Room
211
TBDStart Time
Speaker
R. Douglas Martin
This talk is a personalized account of John Tukey\'s contributions to robust statistics, as well as a summary of the maturation of robustness theory and practice to date. I begin by fondly recalling the way in which Tukey and I became acquainted, how he gave me my start in Statistics at Princeton and Bell Laboratories, and the very stimulating research environment of the Mathematics and Statistics Research Center at Bell Laboratories in 1970\'s and 1980\'s. Building
Room
205
Inference for Point and Partially Identified SemiNonparametric Conditional Moment ModelsStart Time
Speaker
Jing Tao
This paper considers seminonparametric conditional moment models where the parameters of interest include both finitedimensional parameters and unknown functions. We mainly focus on two inferential problems in this framework. First, we provide new methods of uniform inference for the estimates of both finite and infinitedimensional components of the parameters and functionals of the parameters. Based on these results, we can, for instance, construct uniform confidence bands for the unknown functions and the partial derivatives of the unknown functions. Building
Room
211
Spatial Data Assimilation for Regional Environmental Exposure StudiesStart Time
Speaker
Kate Calder
Characterizing variation in human exposure to toxic substances over large populations often requires an understanding of the geographic variation in environmental levels of toxicants. This knowledge is essential when the primary routes of exposure are through interactions with environmental media, as opposed to more individualspecific exposure routes (e.g., occupational exposure). In this study, we focus on modeling the spatial variation in the concentration of arsenic, a toxic heavy metal, in air, soil, and water across the state of Arizona. Building
Room
304
The Relationship Between CountLocation and Stationary Renewal Models for the Chiasma ProcessStart Time
Speaker
Sharon Browning
It is often convenient to define models for the process of chiasma formation at meiosis as stationary renewal models. However, countlocation models are also useful, particularly to capture the biological requirement of at least one chiasma per chromosome. The Sturt model and truncated Poisson model are both countlocation models with this feature. We show that the truncated Poisson model can also be expressed as a stationary renewal model, while the Sturt model cannot. Building
Room
205
On generalizations of the loglinear modelStart Time
Speaker
Tamas Rudas
Relational models generalize loglinear models for multivariate categorical data in three aspects. The sample space does not have to be a Cartesian product of the ranges of the variables, the effects allowed in the model do not have to be associated with cylinder sets, and the existence of an overall effect present in every cell is not assumed. After discussing examples which motivate these generalizations, the talk will consider estimation and testing in relational models. Building
Room
211
Random Tomography and Structural BiologyStart Time
Speaker
Victor M. Panaretos
Single particle electron microscopy is a powerful method that biophysicists employ to learn about the structure of biological macromolecules. In contrast to the more traditional crystallographic methods, this method images â€œunconstrainedâ€ particles, thus posing a variety of statistical problems. We formulate and study such a problem, one that is essentially of a random tomographic nature, where a structural model for a biological particle is to be constructed given random projections of its Coulomb potential density, observed through the electron microscope. Building
Room
102
Topological InferenceStart Time
Speaker
Larry Wasserman
I will discuss three related topics: estimating manifolds, estimating ridges and estimating persistent homology. All three problems are aimed at the problem of extracting topological information from point clouds. This is joint work with many people. Bio: Building
Room
211
Yule's "Nonsense Correlation" Solved!Start Time
Speaker
Philip Ernst
In this talk, I will discuss how I recently resolved a longstanding open statistical problem. The problem, formulated by the British statistician Udny Yule in 1926, is to mathematically prove Yule's 1926 empirical finding of ``nonsense correlation.” We solve the problem by analytically determining the second moment of the empirical correlation coefficient of two independent Wiener processes. Using tools from Fredholm integral equation theory, we calculate the second moment of the empirical correlation to obtain a value for the standard deviation of the empirical correlation of nearly .5. Building
Room
211
Bayesian Survival Modeling of the TimeDependent Effect of a TimeDependent CovariateStart Time
Speaker
Sebastien Haneuse
Patients undergoing organ transplantation are often administered drugs that suppress their autoimmune system, to avoid rejection of the new organ. A consequence of this is that risk of a variety of conditions is elevated until the drugs are eliminated. In this research we seek to characterize risk of posttransplant lymphoma among kidney transplant recipients. Of key interest is the possibly timevary effect of a timedependent covariate: transplant status while on the waiting list. Building
Room
102
On Standard Inference for GMM with Seeming Local Identification FailureStart Time
Speaker
Jihyung Lee
This paper studies the Generalized Method of Moments (GMM) estimation and inference problem that occurs when the Jacobian of the moment conditions is degenerate. Dovonon and Renault (2013, Econometrica) recently raised a local identification issue stemming from this degenerate Jacobian. The local identification issue leads to a slow rate of convergence of the GMM estimator and a nonstandard asymptotic distribution of the overidentification tests. We show that the degenerate Jacobian matrix may contain nontrivial information about the economic model. Building
Room
211
State Space Mixed Models for Longitudinal Observations with Binary and Binomial ResponsesStart Time
Speaker
Claudia Czado
A new class of state space models for longitudinal discrete response data, where the observation equation is specified in an additive form involving both deterministic and dynamic components is proposed. These models allow us to explicitly address the effects of trend, seasonal or other timevarying covariates, while preserving the power of state space models in modeling dynamic pattern of data. Different Markov chain Monte Carlo algorithms to carry out statistical inference for models with binary and binomial responses are developed. Building
Room
102
Why Should We Perfect Simulate?Start Time
Speaker
Nancy Lopes Garcia
Perfect simulation, or exact sampling, refers to a recently developed set of techniques designed to produce a sequence of independent random quantities whose distribution is guaranteed to follow a given probability law. These techniques are particularly useful in the context of Markov Chain Monte Carlo iterations, but the range of their applicability is growing rapidly. Perfect simulation algorithms provide samples with the desired exact distribution and also explicitly determine how many steps are necessary in the Markov Chain to achieve the desired outcome. Building
Room
211
Probabilistic Rainfall ForecastingStart Time
Speaker
Max Little
Rain is vital to life yet potentially extremely destructive and forecasting is critical to water management. Rain is a difficult atmospheric variable to predict, and traditional deterministic â€œpointâ€ forecasts of rainfall misrepresent the uncertainty associated with the methods by which rainfall is measured, modelled and predicted. A recognition amongst meteorologists is that probabilistic forecasts, that is, issuing a probability density as a forecast rather than a deterministic point value, is desirable. Building
Room
304
Semiparametric Methods for Missing Data Problems and Their Applications to MultiPhase DesignsStart Time
Speaker
Nilanjan Chatterjee
Advisors: Jon Wellner & Norman Breslow Building
Room
105
A Bayesian information criterion for singular modelsStart Time
Speaker
Mathias Drton
We consider approximate Bayesian model choice for model selection problems that involve models whose Fisherinformation matrices may fail to be invertible along other competing submodels. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz's Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist largesample behavior of their marginal likelihood. Building
Room
211
Statistical Modeling in Setting Air Quality StandardsStart Time
Speaker
Jim V. Zidek
The earth\'s atmosphere is a stochastic complex system which includes amongst other things pollution fields some of which derive from anthropogenic sources. Because of their negative health impacts, these fields are now the subject to regulation. However setting the air quality standards needed to regulate them is itself a complex business and that leads to a need for good models for these fields and for predicting human exposures to them. This talk, drawing on my recent experience and research connected with ozone, will describe: Building
Room
102
From Data to DecisionsStart Time
Speaker
Eric Horvitz
I will present on directions with harnessing predictive models to guide decision making. I will first discuss methods for using machine learning to ideally couple human and computational effort, focusing on several illustrative efforts, including spoken dialog systems and citizen science. Then I will turn to challenges with healthcare and describe work to field statistical models in realworld clinical settings, focusing on the opportunity to join predictions about outcomes with utility models to guide intervention. Building
Room
211
From safe screening rules to working sets for faster Lassotype solversStart Time
Speaker
Joseph Salmon
Convex sparsity promoting regularizations are now ubiquitous to regularize inverse problems in statistics, in signal processing and in machine learning. By construction, they yield solutions with few nonzero coefficients. This point is particularly appealing for Working Set (WS) strategies, an optimization technique that solves simpler problems by handling small subsets of variables, whose indices form the WS. Such methods involve two nested iterations: the outer loop corresponds to the definition of the WS and the inner loop calls a solver for the subproblems. Building
Room
211
Spatial Statistical Models that Use Flow and Stream DistanceStart Time
Speaker
Jay Ver Hoef
We develop spatial statistical models for stream networks that can estimate relationships between a response variable and other covariates, make predictions at unsampled locations, and predict an average or total for a stream or a stream segment. There have been very few attempts to develop valid spatial covariance models that incorporate flow, stream distance, or both. The application of typical spatial autocovariance functions based on Euclidean distance, such as the spherical covariance model, are not valid when using stream distance. Building
Room
102
Statistical Methods for Ambulance Fleet ManagementStart Time
Speaker
Dawn Woodard
We introduce statistical methods to address two forecasting problems arising in the management of ambulance fleets: (1) predicting the time it takes an ambulance to drive to the scene of an emergency; and (2) spacetime forecasting of ambulance demand. These predictions are used for deciding how many ambulances should be deployed at a given time and where they should be stationed, which ambulance should be dispatched to an emergency, and whether and how to schedule ambulances for nonurgent patient transfers. Building
Room
211
Nonparametric Estimation of the Time to the Discovery of a New SpeciesStart Time
Speaker
Nicolas Hengartner
Species inventories that list of all species present in a given area are an important tool for both the study of biodiversity and conservation biology. These lists are typically obtained from fields studies in which biologists record all the species they can observed over a finite time period. Because of the possible presence of rare, and thus hard to observe species, completeness of such lists can never be guaranteed, regardless of the amount of time and energy spent in compiling them. Building
Room
205
Structured Probabilistic Topic ModelsStart Time
Speaker
John Paisley
Advances in scalable machine learning have made it possible to learn highly structured models on large data sets. In this talk, I will discuss some of our recent work in this direction. I will first briefly review scalable probabilistic topic modeling with stochastic variational inference. I will then then discuss two structured developments of the LDA model in the form of treestructured topic models and graphstructured topic models. I will present our recent work in each of these areas. Building
Room
211
Hypothesis Testing in Algebraic Statistical ModelsStart Time
Speaker
Mathias Drton
Many statistical models are defined in terms of polynomial constraints, or in terms of polynomial or rational parametrizations. Such algebraic models include, for instance, factor analysis and instrumental variable models, latent class models, and more generally, discrete and Gaussian graphical models with hidden variables. Statistical inference in hidden variable models is complicated by the fact that the models\' parameter spaces are typically not smooth. This is the motivation for this talk that considers testing a null hypothesis with singularities in algebraic models. Building
Room
304
Speed Bumps on the Road to Meritocracy: Occupational Mobility of Women and Men in the U.S., 19721994Start Time
Speaker
Michael Hout
After 25 years of improvement, opportunity through social mobility has levelled off in the United States. The association between occupational origins and destinations did not change between the first half of the 1980s and the first half of the 1990s. Detailed mobility tables from the General Social Survey show that the effect of socioeconomic origins on the socioeconomic status of women\'s and men\'s occupations in 19914 is at the same level found in the early 1980s. Building
Room
120
A Bayesian information criterion for singular modelsStart Time
Speaker
Mathias Drton
We consider approximate Bayesian model choice for model selection problems that involve models whose Fisherinformation matrices may fail to be invertible along other competing submodels. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz's Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist largesample behavior of their marginal likelihood. Building
Room
211
Assessing Spatial Heterogeneity of Evolutionary Processes: Smoothing with Markov Fields and Jumping on Markov ChainsStart Time
Speaker
Vladimir Minin
Signatures of spatial variation, left by evolutionary processes in genomic sequences, provide important information about the function and structure of genomic regions. I discuss statistical methods for detection of such signatures in a Bayesian framework. I start with phylogenetic analysis of recombination in the HIV genome. I present a recombination detection method that allows accurate estimation of recombination breakpoints from a molecular sequence alignment. Building
Room
205
The Blessing of Transitivity in Sparse and Stochastic NetworksStart Time
Speaker
Karl Rohe
The interaction between transitivity and sparsity, two common features in empirical networks, implies that there are local regions of large sparse networks that are dense. We call this the blessing of transitivity and it has consequences for both modeling and inference. Extant research suggests that statistical inference for the Stochastic Blockmodel is more difficult when the edges are sparse. However, this conclusion is confounded by the fact that the asymptotic limit in all of the previous studies is not merely sparse, but also nontransitive. Building
Room
211
Confidence Sets for Phylogenetic TreesStart Time
Speaker
Amy Willis
