SAV

Building Map

Estimating and Forecasting the Smoking-attributable Mortality Fraction for Both Sexes Jointly in 69 Countries

Start Time
Speaker
Yicheng Li

Smoking is one of the preventable threats to human health and is a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality and life expectancy projection.

Building
Room
409

Statistical Inference for the Mean Outcome Under a Possibly Non-Unique Optimal Treatment Strategy

Start Time
Speaker
Alex Luedtke

An individualized treatment rule (ITR) is a treatment rule which assigns treatments to individuals based on (a subset of) their measured covariates. An optimal ITR is the ITR which maximizes the population mean outcome. In any given problem, there is no guarantee that the optimal ITR will outperform standard practice. The utility of personalization can be explored using a confidence interval for the mean outcome under the optimal rule.

Building
Room
409

Scalable Manifold Learning

Start Time
Speaker
James Michael McQueen

Advisor: Marina Meila Abstract: This talk investigates the methodology and scalability of non-linear dimension reduction techniques. With data being observed in increasingly higher dimensions and on a larger scale than before, the demand for non-linear dimension reduction is growing. There is very little consensus, however, on how non-linear dimension reduction should be performed. The goal of Manifold Learning (ML) is to embed the data into s-dimensional Euclidean space (where manifold dimension < s < observed dimension) without distorting the geometry. Existing ML algorithms (e.g.

Building
Room
409

The PRISM Approach to Climate Mapping in Complex Regions

Start Time
Speaker
Christopher Daly

The PRISM Group (formerly known as the Spatial Climate Analysis Service) at Oregon State University is the de facto climate mapping center for the United States, and a leader in the emerging discipline of geospatial climatology. Under funding from the USDA-NRCS, NOAA, NPS, USFS, and other agencies, The PRISM Group has mapped the long-term mean climate on a monthly basis for all US states and possessions. These maps are the official climate data sets of the USDA, and have been used in thousands of applications worldwide.

Building
Room
249

Geographical Analysis and Ethical Dilemmas in the Study of Childhood Leukemias in Great Britain

Start Time
Speaker
Julian Besag

Besag and Newell (1991) provide what might be viewed as a statistician's version of Stan Openshaw's Geographical Analysis Machine (GAM), with the specific aim of identifying spatially localized anomalies ("clusters") in a database comparing the addresses at the time of diagnosis of all registered cases of childhood leukemias in Great Britain between 1966 and 1983 with the nominal populations at risk in more than 100,000 census enumeration districts (ED's).

Building
Room
209

Linear Structural Equation Models with Non-Gaussian Errors

Start Time
Speaker
Yu-Hsuan S. Wang

In this talk, we consider structural equation models represented by a mixed graph which encode both direct causal relationships as well as latent confounding. First, we use an empirical likelihood approach to fit structural equation models without explicitly assuming a distributional form for the errors. Through simulations, we show that when the errors are skewed, the empirical likelihood approach may provide a more efficient estimator than methods assuming a Gaussian likelihood.

Building
Room
140

The Multivariate Skew Normal Distribution and Some Extensions

Start Time
Speaker
Antonella Capitanio

The multivariate skew normal distribution extends the class of normal distributions by the addition of a shape parameter. It allows to model phenomena whose empirical outcome behaves in non-normal fashion but still retains some similarity with the normal distribution. It has been introduced in Azzalini & Dalla Valle (1996), and further probabilistic properties as well as statistical aspects have been explored in Azzalini and Capitanio (1999).

Building
Room
249

Why Do Model Ensembles Work?

Start Time
Speaker
Pedro Domingos

Learning an ensemble of models instead of a single one can be a remarkably effective way to reduce predictive error. Ensemble methods include bagging, boosting, stacking, error-correcting output codes, and others. But how can we explain the amazing success of these methods (and hopefully design better ones as a result)? Many different explanations have been proposed, using concepts like the bias-variance tradeoff, margins, and Bayesian model averaging. However, each of these explanations has significant shortcomings.

Building
Room
209

The Marriage Model: A Two Sided Model of Opportunity and Choice

Start Time
Speaker
Peter D Hoff

We consider a parametric version of the two-sided matching model described in Roth and Sotomayer (1990). We develop the model for the analysis of matching data, where the data consist of pairs of individuals, with one individual from each of two distinct populations (for example employers and employees, or men and women). Individuals agree to form pairs based on utilities they have for one another, resulting in a stable set of matches between the two populations.

Building
Room
209

Middles: Means, Medians, Metrics, and Other Things That Start With M

Start Time
Speaker
J. McLean Sloughter

One of the first topics to come up in an introductory statistics course is means and medians. But why do we have more than one way of measuring the "middle" of a set of data? This talk will show how different metrics (ways of defining distance) give rise to different measures of "middle", as well as looking at some of the practical reasons we might pick one measure over another.

Building
Room
409

Reliability Reloaded

Start Time
Speaker
Sallie Keller-McNulty

In this age of exponential growth in science, engineering, and technology, the capability to evaluate the performance, reliability, and safety of complex systems presents new challenges. Today\'s methodology must respond to the ever increasing demands for such evaluations to provide key information for decision and policy makers at all levels of government and industry on problems ranging from national security to space exploration.

Building
Room
249

Estimating the Influence of Social Networks on Migration Decisions

Start Time
Speaker
Alberto Palloni

This talk is about the influences of kinships on first migration. Event history data collected in a special survey of several Mexican communities are used to show that the migration decisions of individuals are affected by what fathers and brothers do or have done in the past. Instead of simple individual models, the models proposed are designed to retrieve fixed and time dependent effects on the joint migration risks of two members of a pair while simultaneously reducing or eliminating the impact of unmeasured common conditions shared by the members of the pairs.

Building
Room
209

Joint Modeling for Longitudinal and Time-To-Event Data: An Application in Nephrology

Start Time
Speaker
Theresa R. Smith

Many medical studies collect both repeated measures data and survival data. In this talk, I discuss jointly modeling these two kinds of data in a study of patients with chronic kidney disease in which longitudinal biomarkers of kidney function and time to cardiovascular events were recorded. Joint modeling these processes is important because the measurements of kidney function are error-prone. Ignoring this error (e.g., using a simpler time-varying covariates model) can give biased estimates of the effect of kidney function on the risk of cardiovascular events.

Building
Room
409

Uncertainty and Evidence in Latent Variable Problems

Start Time
Speaker
Elizabeth A Thompson

In many areas of science, models involve unseen latent variables. Often these variables are such that, were we able to observe them, the testing of scientific hypotheses would be straightforward. A classical example is that of Bernoulli trials (tosses of a fair coin) observed with error. If the number of successes (heads) is observed, testing that the coin is fair is straightforward, but how should uncertainty in observation be taken into account?

Building
Room
249

Networks and HIV: The Structure of Transmission

Start Time
Speaker
Martina Morris

The dramatic variations in HIV spread among different populations of the world has led to much research on the causal mechanisms of transmission. Some focus on bio-medical factors, such as the enhanced infectivity caused by other sexually transmitted diseases, others focus on the sexual behavior and network patterns that spread the pathogen. This talk will explore several examples from the network paradigm, which data from the United States, Uganda, and Thailand.

Building
Room
209

Harnessing Network Science to Reveal Our Digital Footprints: Social Networks and Communities

Start Time
Speaker
Jukka-Pekka Onnela

Network science is an interdisciplinary endeavor, with methods and applications drawn from across the natural, social, and information sciences. In addition to theoretical developments, electronic databases currently provide detailed records of human communication and interaction patterns, offering novel avenues to map and explore the structure of social networks. I will talk about the structure of a social network based on the cell phone communication patterns of millions of individuals, and what implications it has for diffusion processes on social networks.

Building
Room
409

Bayesian Mixed Models for Functional Data

Start Time
Speaker
Jeffrey Morris

Many studies yield functional data, with the ideal units of observation curves and observed data sampled on a fine grid. These curves frequently have irregular features requiring spatially adaptive nonparametric representations. I will discuss new methods for modeling these data using functional mixed models, which treat the curves as responses and relate them to covariates using nonparametric fixed and random effect functions.

Building
Room
249

Randomized Polya Trees: Bayesian Nonparametrics for Exploration of Multidimensional Structures

Start Time
Speaker
Susan Paddock

By comparison with modern parametric Bayesian statistics, practicable and robust methods for exploration and data analysis in nonparametric settings are underdeveloped. The rapid development of non-Bayesian methods and ranges of ad-hoc non-parametric tools for data mining reflect the need for a non-parametric Bayesian approach to exploring and managing data sets in even moderate dimensional problems. I will address this issue by presenting multivariate Polya tree based methods for modeling multidimensional probability distributions.

Building
Room
249

Probabilistic Forecasts, Calibration and Sharpness

Start Time
Speaker
Tilmann Gneiting ARTICLE

Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations, and is a joint property of the predictions and the events that materialize.

Building
Room
249

Dependence Orderings and Their Applications

Start Time
Speaker
Subhash Kochar

Suppose we have two bivariate random vectors and we want to compare them according to the degree of dependence between their component random variables. Rather than using a single summary statistic like coefficient of correlation, it is more useful and informative to compare the whole distributions or some aspects of them. For this purpose, several partial orders have been introduced in the literature assuming that their marginal distributions are identical. But in many problems of practical interest, this is not the case.

Building
Room
249

Modelling Claim Frequency and Claim Size in Insurance Including Spatial Effects

Start Time
Speaker
Claudia Czado

In this talk models for claim frequency and average claim size in non-life insurance are considered. Both covariates and spatial random effects are included allowing the modelling of a spatial dependency pattern. We assume a Poisson model for the number of claims, while claim size is modelled using a Gamma distribution. However, in contrast to the usual compound Poisson model going back to Lundberg (1903), we allow for dependencies between claim size and claim frequency. A fully Bayesian approach is followed; parameters are estimated using Markov Chain Monte Carlo (MCMC).

Building
Room
249

A Theory for Texture Modeling and Random Field Approximation

Start Time
Speaker
Yingnian Wu

Texture is a powerful cue in visual perception, and texture analysis and synthesis has been an active research area in computer vision. We present a statistical theory for texture modeling and random field approximation, which combines multi-channel filtering and random field modeling via the maximum entropy principle. Our theory characterizes a texture by a random field, the modeling of which consists of two steps.

Building
Room
249

Probabilistic Forecasting in Meteorology

Start Time
Speaker
Barbara Brown

Expressing weather forecasts as probabilities has been a regular (though small) part of operational meteorological forecasting in the U.S. since at least 1965, when the Weather Bureau produced its first probability of precipitation forecasts. In fact, the concept that weather forecasts are uncertain has been understood since the early days of weather forecasting (e.g., in the late 1800s, Cleveland Abbe, the “Father” of weather forecasting in the U.S., called his forecasts “probabilities”).

Building
Room
249

Bayesian Component Selection in High Dimensional Models

Start Time
Speaker
Michael Smith

This presentation outlines Bayesian selection methodology for semiparametric components in two different scenarios. The first is in models that involve additive semiparametric function estimation, while the second is in time-space varying coefficient models. While these statistical models may differ, the approach to modeling the flexible components involved is similar. It entails the adoption of proper shrinkage priors, coupled with a point mass at zero. In effect, this corresponds to adopting both traditional Bayesian regularization and model averaging simultaneously.

Building
Room
249

Prediction, Intervention and Discovery

Start Time
Speaker
Christopher Meek

This talk will focus on topics related to Bayesian networks, a type of graphical model. In general, a graphical model has a qualitative part, a graph over a set of variables, and a quantitative part, a joint distribution over the set of variables. The qualitative part represents a set of independence constraints true of the joint distribution. In the case of a Bayesian network the graph is a directed acyclic graph.

Building
Room
239

Lévy Noise Induced Transitions Between Meta-Stable States in Stochastic (Partial) Differential Equations

Start Time
Speaker
Peter Imkeller

A spectral analysis of the time series representing average temperatures during the last ice age featuring the Dansgaard-Oeschger events reveals an a-stable noise component with an a ~ 1.78. Based on this observation, papers in the physics literature attempted a qualitative interpretation by studying diffusion equations that describe simple dynamical systems perturbed by small Lévy noise. We study exit and transition problems for solutions of stochastic differential equations and stochastic reaction-diffusion equations derived from this proto type.

Building
Room
249

In Search of the Magic Lasso: The Truth About the Polygraph

Start Time
Speaker
Stephan E. Fienberg

Tens of thousands of individuals undergo polygraph security screening examinations in the U.S. every year. How good is the polygraph in detecting deception in such settings? Is there a scientific underpinning for the detection of deception? Are there suitable alternatives to the polygraph for security screening? Two years ago, the NAS-NRC Committee to Review the Scientific Evidence on the Polygraph released it\'s report, “The Polygraph and Lie Detection,” addressing these issues.

Building
Room
209

Bayesian is converted into frequentist by reversing the sign of the data length

Start Time
Speaker
Hidetoshi Shimodaira

The observed frequency of a particular outcome in data-based simulation, known as bootstrap probability (BP) of Felsenstein (1985), is very useful as a confidence level of data analysis with discrete outcomes such as estimating the phylogenetic tree from aligned DNA sequences or identifying the clusters from microarray expression profiles. We argue that the length of simulated data sets should be

Building
Room
156

A Bayesian Approach to Map Multiple QTL in Pedigreed Plant Breeding Populations

Start Time
Speaker
Marco Bink

The availability of molecular genetic markers enables the dissection of a quantitative trait into quantitative trait loci (QTL), i.e., chromosomal regions that show strong association with the observed phenotypic trait variance. In plants, the first QTL experiments were targeted to a single mapping population that was derived from crossing two extreme, often fully-inbred, individuals. This simple design allowed regression and Maximum Likelihood methods for data analysis. However, the success of the identified QTL was hampered by several factors.

Building
Room
249

Robust Betas and Alphas

Start Time
Speaker
Heiko Manfred Bailer

The Capital Asset Pricing Model (CAPM) is today\'s most important financial model for estimating cost of capital and asset allocation. It\'s centerpiece are variables, commonly called betas and alphas, estimated using ordinary least squares (OLS) regression. Since financial returns typically have an asymmetric and heavy-tailed distribution, OLS estimates can be severely biased. In this talk we will introduce robust regression estimates with zero bias in beta and low bias in alpha (even under asymmetric distributions) but 99% asymptotic efficiency at the Gaussian model.

Building
Room
341

Statistical Inference on Covariance Structure and Sparse Discriminant Analysis

Start Time
Speaker
Tony Cai

Covariance structure is of fundamental importance in many areas of statistical inference and a wide range of applications, including genomics, fMRI analysis, risk management, and web search problems. In the high dimensional setting where the dimension p can be much larger than the sample size n, classical methods and results based on fixed p and large n are no longer applicable. In this talk, I will discuss some recent results on optimal estimation of covariance/precision matrices as well as sparse linear discriminant analysis with high-dimensional data.

Building
Room
260

Wishart Distributions for Decomposable Graphs

Start Time
Speaker
Helene Massam

When considering a graphical Gaussian model NG Markov with respect to a decomposable graph G, the parameter space of interest for the precision parameter is the cone PG of positive definite matrices with fixed zeros corresponding to the missing edges of G. The parameter space for the scale parameter of NG is the cone QG, dual to PG, of incomplete matrices with submatrices corresponding to the cliques of G being positive definite. We construct on the cones QG and PG two families of Wishart distributions, namely the type I and type II Wisharts.

Building
Room
249

Learning from Big Data in biology

Start Time
Speaker
Marc Suchard

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. Likewise, fusion of real-time satellite data with in situ sea surface temperature measurements for ecological modeling remains taxing for probabilistic spatial-temporal models on a global scale.

Building
Room
260

BART: Bayesian Additive Regression Trees

Start Time
Speaker
Hugh Chipman

We develop a Bayesian “sum-of-trees” model, named BART, where each tree is constrained by a prior to be a weak learner. Fitting and inference are accomplished via an iterative backfitting MCMC algorithm. This model is motivated by ensemble methods in general, and boosting algorithms in particular. Like boosting, each weak learner (i.e., each weak tree) contributes a small amount to the overall model. However, our procedure is defined by a statistical model: a prior and a likelihood, while boosting is defined by an algorithm.

Building
Room
249

A nonparametric Bayesian model for legislative voting

Start Time
Speaker
Abel Rodriguez

Legislative voting records are widely used in political sciences to characterize revealed preferences among the member of a deliberative assembly. In this context, item-response models (a class of latent factor models) such as NOMINATE and IDEAL are the preeminent quantititive tools used for analysis. This class of models assumes that member\'s choices can be explained by continuous latent features, often called ideal points. For unidimensional latent spaces, this often results in a ranking of members along the liberal-conservative spectrum.

Building
Room
260

Semiparametric Hierarchical Bayes Analysis of Discrete Panel Data with State Dependence

Start Time
Speaker
Ivan Jeliazkov

In this paper we consider the analysis of semiparametric models for binary panel data with state dependence. A hierarchical modeling approach is used for dealing with the initial conditions problem, for addressing heterogeneity, and for incorporating correlation between the covariates and the random effects. We consider a semiparametric model in which a Markov process prior is used to model an unknown regression function.

Building
Room
209

Statistical Machine Learning and Big-p Data

Start Time
Speaker
Pradeep Ravikumar

With modern \"Big Data\" settings, off-the-shelf statistical machine learning methods are frequently proving insufficient. A key challenge posed by these modern settings is that the data might have a large number of features, in what we will call \"Big-p\" data, to denote the fact that the dimension \"p\" of the data is large, potentially even larger than the number of samples.

Building
Room
409

Priors and Predictive Performance: An Integrated Procedure to Identify Robust Growth Determinants

Start Time
Speaker
Theo S Eicher

Model uncertainty is central to economics, where researchers attempt to discriminate among alternative theories in robustness analyses. Bayesian Model Averaging (BMA) is an approach designed to address model uncertainty as part of the empirical strategy. Applications of BMA to economics are widespread; however it is often unclear whether subtle differences in the choice of parameter and model priors affect inference. We present an integrated procedure, based on 12 popular, noninformative parameter priors and any given model prior, to conduct sensitivity analysis in BMA.

Building
Room
249

Apportionment Methods in Proportional Representation: A Majorization Representation

Start Time
Speaker
Ingram Olkin

From the inception of the proportional representation movement is has been an issue whether larger parties are favored at the expense of smaller parties in one apportionment of seats as compared to another apportionment. A number of methods have been proposed and are used in countries with a proportional representation system. These methods exhibit the regularity of order that captures the preferential treatment of larger versus smaller parties. This order, namely majorization, permits the comparison of seat allocation in two apportionments.

Building
Room
209

Shaping Social Activity by Incentivizing Users

Start Time
Speaker
Le Song

Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state?

Building
Room
264

Environmental Standards from a Statistical Point of View

Start Time
Speaker
Peter Guttorp

The most common way for governments to protect the population from environmental insults, such as air or water pollution, is to set a standard. Most standards consist of two parts: a cutoff value beyond which health risks are deemed unacceptable, and an implementation rule, specifying how compliance with the standard will be ascertained. We illustrate the concepts with two US environmental standards, one for air pollution and one for water pollution. From a statistical point of view, the US EPA implementation rules in these examples have poor performace characteristics.

Building
Room
209

Lord's Paradox and Targeted Interventions: The Case of Special Education

Start Time
Speaker
Roderick M. Theobald

Advisors: Thomas Richardson and Dan Goldhaber Lord (1967) describes a hypothetical “paradox” in which two statisticians, analyzing the same dataset using different but defensible methods, come to very different conclusions about the effects of an intervention on student outcomes.

Building
Room
409