PDL

Building Map

Functional Estimation in Nonparametric Regression

Start Time
Speaker
Yandi Shen

Consider the heteroscedastic nonparametric regression model with random design $Y_i = f(X_i) + V^{1/2}(X_i)\varepsilon_i, \quad i=1,2,\ldots,n$, with $f(\cdot)$ and $V(\cdot)$ $\alpha$- and $\beta$-H\" older smooth, respectively. We show that the minimax rate of estimating $V(\cdot)$ under both local and global squared risks is of the order $n^{-\frac{8\alpha\beta}{4\alpha\beta + 2\alpha + \beta}} \vee n^{-\frac{2\beta}{2\beta+1}}$, where $a\vee b\define \max\{a,b\}$ for any two real numbers $a,b$.

Building
Room
C-301

Flexible spatial models for household survey data in low and middle income countries

Start Time
Speaker
John Leonard Paige

The need for rigorous and timely health and demographic summaries has led to an explosion in geographic studies, particularly in low and middle income countries. While household surveys are a major source of data in this context, they present challenges for statistical modeling. These challenges include biases due to oversampling certain population segments, nonlinear interactions between covariates, and multiple scales of prediction. However, many common statistical methods have never been tested rigorously in these settings.

Building
Room
C-301

Faculty Meeting - Monday, October 21, 2019

Start Time

The regular meeting of the faculty of the Department of Statistics was held in C-301 Padelford Hall at 12:30pm, October 21st, 2019. Daniel Pollack, Interim Chair, presided at the meeting. Kristine Chan was recording secretary.

The meeting began with approval of previous meeting’s minutes from October 7, 2019.

Chair’s Remarks
Daniel Pollack announced Vickie Graybeal’s twenty years of service award.

Building
Room
C-301

Faculty Meeting - Monday, October 7, 2019

Start Time

The regular meeting of the faculty of the Department of Statistics was held in C-301 Padelford Hall at 12:30pm, October 7th, 2019. Daniel Pollack, Interim Chair, presided at the meeting. Kristine Chan was recording secretary.

The meeting began with approval of previous meeting’s minutes from September 23, 2019.

Chair’s Remarks
Daniel Pollack went over existing department policies that were up for renewal. Of the policies, the delegation of authority, merit review process, and retention consultation were reviewed, voted, and approved. 

Building
Room
C-301

Faculty Meeting - Monday, September 23, 2019

Start Time

The regular meeting of the faculty of the Department of Statistics was held in C-301 Padelford Hall at 12:30pm, September 23rd, 2019. Daniel Pollack, Interim Chair, presided at the meeting. Kristine Chan was recording secretary.

Chair’s Remarks
Daniel Pollack reported there will be no faculty retreat this Autumn quarter. Planning for the retreat will be revisited in the Spring.

He also provided updates and timelines on the two ongoing searches: Full Professor & Chair and Assistant Professor. 

Building
Room
C-301

Latent Variable Models for Indirectly or Imprecisely Measured Networks

Start Time
Speaker
Wesley T Lee

In the social sciences, social networks are important structures which represent the relationships and interactions between actors in a population of study. The most common methods for measuring networks are to survey study participants about who their connections are and to collect interaction activity between pairs of actors. However, directly measuring the exact network of interest can be challenging.

Building
Room
C-301

Estimation and testing under shape constraints

Start Time
Speaker
Nilanjana Laha

Over the last few decades, shape constrained methods have increasingly gathered importance in statistical inference as  attractive alternatives to  traditional  nonparametric  methods which often require tuning parameters and restrictive smoothness assumptions. This talk focuses on application of shape-constraints like unimodality and log-concavity in comparing the outcome of two  HIV vaccine trials. To this end, we develop  shape-constrained tests of stochastic dominance, and shape-constrained plug-in estimator of  the Hellinger distance between two densities.

Building
Room
C-301

Realized genome sharing in random effects models for quantitative genetic traits

Start Time
Speaker
Bowen Wang

DNA copies inherited from the same ancestral copy by related individuals are said to be identical by descent (IBD). IBD gives rise to genetic similarities between related individuals. In quantitative genetics, two fundamental problems are heritability estimation and gene mapping for genetic traits. IBD plays a critical role in the study of both problems. When working with population-based samples where pedigree information is unavailable, it is essential to estimate IBD accurately from genetic marker data using pedigree-free methods.

Building
Room
C-14A

Inferring Network Structure From Partially Observed Graphs

Start Time
Speaker
Mengjie Pan

Collecting social network data is notoriously difficult, meaning that indirectly observed or missing observations are very common. In this talk, we address two of such scenarios: inference on network measures without network observations and inference of regression coefficients when actors in the network have latent block memberships.

Building
Room
C-301

High-dimensional independence testing with maxima of rank correlations

Start Time
Speaker
Hongjian Shi

Testing mutual independence for high-dimensional observations is a fundamental statistical challenge. Popular tests based on linear and simple rank correlations are known to be incapable of detecting non-linear, non-monotone relationships, calling for methods that can account for such dependences. To address this challenge, we propose a family of tests that are constructed using maxima of pairwise rank correlations that permit consistent assessment of pairwise independence.

Building
Room
C-301

Recursive Inversion Models for Partially Ranked Data

Start Time
Speaker
Annelise Mavis Wagner

Can we do exact and tractable inferences in Mallows-like models for incomplete data? I will show that the answer is yes for the most general form Mallows-type model and a large class of partial orders known as partial rankings (including special cases like top-t rankings). I will also demonstrate that despite partial rankings lacking a sufficient statistic, exact inference is possible with overhead that is at most polynomial in O(nN) and that, in practice, the overhead per data point is negligible.

Building
Room
C-301

Fitting Stochastic Epidemic Models to Multiple Data Types

Start Time
Speaker
Mingwei Tang

Traditional infectious disease epidemiology focuses on fitting deterministic and stochastic epidemics models to surveillance case count data. Recently, researchers began to make use of infectious disease agent genetic data to complement statistical analyses of case count data. Such genetic analyses rely on the field of phylodynamics --- a set of population genetics tools that aim at reconstructing demographic history of a population based on molecular sequences of individuals sampled from the population of interest.

Building
Room
C-301

Large-Scale B Cell Receptor Sequence Analysis Using Phylogenetics and Machine Learning

Start Time
Speaker
Amrit Dhar

The adaptive immune system synthesizes antibodies, the soluble form of B cell receptors (BCRs), to bind to and neutralize pathogens that enter our body. B cells are able to generate a diverse set of high affinity antibodies through the affinity maturation process. During maturation, ``naive'' BCR sequences first accumulate mutations according to a neutral evolutionary process called somatic hypermutation (SHM), which may modify the associated binding affinities, and then are subject to natural selection by clonal expansion, which promotes the higher affinity antibodies.

Building
Room
C-301

Gradient Group Lasso Identifies Sparse Functional Basis for Molecular Manifolds

Start Time
Speaker
Samson Jonathan Koelle

We present a method for analyzing low-energy paths between molecular conformations by combining techniques in both manifold learning, which identifies such paths, and functional regression, which can parameterize them by explanatory non-linear functions. Unsupervised manifold learning approaches are useful for understanding molecular dynamics simulations since they disregard small-scale information such as peripheral hydrogen vibrations that can nevertheless drastically affect the observed energy.

Building
Room
C-301

Fast nonconvex changepoint detection

Start Time
Speaker
Sean William Jewell

In recent years, new technologies in neuroscience have made it possible to measure the activities of large numbers of neurons in behaving animals. For each neuron, a fluorescence trace is measured; this can be seen as a first-order approximation of the neuron's activity over time. Determining the exact time at which a neuron spikes on the basis of its fluorescence trace is an important open problem in the field of computational neuroscience. Recently, a convex optimization problem involving an L1 penalty was proposed for this task.

Building
Room
C-301

Statistical Methods for Manifold Recovery and C^{1, 1} Regression on Manifolds

Start Time

High-dimensional data sets often have lower-dimensional structure taking the form of a submanifold of a Euclidean space. It is challenging but necessary to develop statistical methods for these data sets that respect the manifold structure. We present research from two different areas: manifold learning (i.e., support estimation) and smooth regression on manifolds.

Building
Room
C-14A

Space-Time Contour Models for Sea Ice Forecasting

Start Time
Speaker
Hannah Director

The amount of sea ice (frozen ocean water) found in the Arctic is declining rapidly as a result of climate change. This has increased the need for accurate forecasts of where sea ice will be located.  Of particular interest is predicting the sea ice edge contour, or the boundary of the region where at least 15% of the area is ice-covered. Current sea ice forecasts are issued from deterministic numerical prediction systems.

Building
Room
C-301

Nonparametric inference on monotone functions, with applications to observational studies

Start Time
Speaker
Theodore Westling

In this dissertation, we study general strategies for constructing nonparametric monotone function estimators in two broad statistical settings. In the first setting, a sensible initial estimator of the monotone function of interest is available, but may fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain.

Building
Room
C-301

Bayesian Methods for Graphical Models with Limited Data

Start Time
Speaker
Zehang Li

Scientific studies in many fields involve understanding and characterizing dependence relationships among large numbers of variables. This can be challenging in settings where data is limited and noisy. Take survey data as an example, understanding the associations between questions may help researchers better explain themes amongst related questions and impute missing values. Yet, such data typically contains a combination of binary, continuous, and categorical variables; a high proportion of missing values; and complex data structures.

Building
Room
C-14A

Preferential sampling and model checking in phylodynamic inference

Start Time
Speaker
Michael D. Karcher

Estimating population size fluctuations is one of the key tasks in Ecology. However, traditional sampling based approaches to perform this task have limitations when populations of interest are extinct or are hard to reach, as is the case for individuals infected for a short time period by a pathogen.

Building
Room
C-301

Analysis of Incomplete Network Data

Start Time
Speaker
Mengjie Pan

Collecting social network data is notoriously difficult, meaning that indirectly observed or missing observations are very common. In this talk, we address two of such scenarios: inference on network measures without any direct network observations and inference of regression coefficients when important features are missing.

Building
Room
C-301

Parameter Identification and Assessment of Independence in Multivariate Statistical Modeling

Start Time
Speaker
Luca Weihs

In this talk we define a new class of multivariate nonparametric measures of dependence that we refer to as symmetric rank covariances. This new class generalizes many existing classical rank measures of dependence, such as Kendall's tau and Hoeffding's D, as well as the more recently discovered Bergsma--Dassios sign covariance. Symmetric rank covariances make explicit the implicit symmetries hidden in the standard definitions of the above measures and, in doing so, lead naturally to multivariate extensions of the Bergsma--Dassios sign covariance.

Building
Room
C-301

Latent Variable Models for Imprecisely or Indirectly Measured Networks

Start Time
Speaker
Wesley T Lee

In the social sciences, social networks are important structures which represent the relationships and interactions between actors in a population of study. In these fields, the most common method for measuring networks is to directly survey study participants about who their connections are. However, directly measuring the network of interest can be challenging. Participants do not always provide accurate accounts of their connections, which can result in mismeasurement of the network.

Building
Room
C-301

Causal Discovery with non-Gaussian Data

Start Time
Speaker
Yu-Hsuan S. Wang

In this talk, we consider causal discovery when the underlying structure corresponds to a linear structural equation model with error terms which are non-Gaussian. Previous work by Shimizu et al. (2006) has shown that under this framework, a unique directed acyclic graph--not simply an equivalence class--can be identified from infinite data. We extend that result in two directions. First, we show that a unique graph can still be consistently recovered in the high dimensional setting where p, the number of variables, exceeds n, the number of observed samples.

Building
Room
C-14A

Composite Likelihood Estimation for Binary Network Models

Start Time
Speaker
Yanjun He

We develop a scalable method to estimate the parameters in models of very large binary network datasets. Maximum likelihood estimates are generally impossible to obtain because the full likelihood involves an intractable high dimensional integral. Also, full-likelihood Bayesian estimation is impractical for very large datasets as the MCMC algorithm is very slow.

Building
Room
C-14A

Faculty Meeting - February 13, 2017

Start Time

Time: 12.30-1.30pm February 13, 2017 
Place: Padelford Hall, C-301 
Agenda:

  1. Updates (Thomas R.)
  2. 3-year Affiliate/Adjunct Renewals (Thomas R.)
  3. Affiliate/Adjunct Re-Appointments (Not up for periodic 3-year review associated with renewal) (Thomas R.)
  4. Case for Promotion to Affiliate Associate Professor (Thomas R.)
  5. Paul Sampson (Thomas R.)
Building
Room
C-301

Faculty Meeting - April 3, 2017

Start Time

Time: 12.30-1.30pm April 3, 2017 
Place: Padelford Hall, C-301 
Agenda:

  1. Upcoming talk by Nature Editor, 4/5, Physics/Astronomy Auditorium A118
  2. Computing Staff Updates (Thomas/Kris)
  3. Web-site overhaul (Thomas/Kris)
  4. Discuss and vote on Affiliate appointment for Sam Clark
  5. Update on Faculty Search for Full-Time Lecturer in Consulting (Elena)
  6. Search request for next year (Thomas)
  7. New learning spaces / scheduling policy (commencing Spring 2018): https://registrar.washington.edu/learning-spaces-faq/
Building
Room
C-301

A Bayesian Surveillance System for Detecting Clusters of Non-Infectious Diseases

Start Time
Speaker
Albert Y. Kim

Advisor: Jon Wakefield We consider the problem of detecting clusters of non-infectious and rare diseases. Cluster detection is the routine surveillance over a large expanse of small administrative regions to identify individual \'hot-spots\' of elevated residual spatial risk without any preconceptions about their locations. A class of cluster detection procedures known as moving-window methods superimpose a large number of circular regions onto the study area.

Building
Room
C-301

Probability and Inference for Random Fields

Start Time
Speaker
Debashis Mondal

In recent decades, there has been much progress and interest in spatial statistics, with applications in agriculture, epidemiology, geology and other areas of environmental science and in image analysis. Two contrasting approaches have emerged, one based on Markov random fields, the other on geostatistics. The development of Markov Chain Monte Carlo as a computational tool has been phenomenal and has made Bayesian inference for spatial models relatively easy to perform, whereas frequentist inference still presents difficult problems.

Building
Room
C-301

Gravimetric Anomaly Detection Using Compressed Sensing

Start Time
Speaker
Ryan D. Kappedal

Advisor: Marina Meila We address the problem of identifying underground anomalies (e.g. holes) based on gravity measurements. This is a theoretically well-studied and difficult problem. In all except a few special cases, the inverse problem has multiple solutions, and additional constraints are needed to regularize it. Our approach makes general assumptions about the shape of the anomaly that can also be seen as sparsity assumptions. We can then adapt recently developed sparse reconstruction algorithms to bear on this problem.

Building
Room
C-301

Probabilistic Projections of Fertility Using a Bayesian Hierarchical

Start Time
Speaker
Leontine Alkema

The United Nations Population Division produces estimates and projections of the total fertility rate for all countries in the world every two years. For countries with fertility above replacement level, future levels are projected by choosing one out of three scenarios describing the pace of future fertility decline.

I will discuss a Bayesian hierarchical model for producing country-specific projections of the total fertility rate, and assessing the uncertainty in these predictions. Results for various countries will be presented.

Building
Room
C-301

Applications of Robust Statistical Methods in Quantitative Finance

Start Time
Speaker
Christopher G. Green

Advisor: Douglas Martin Financial asset returns and fundamental factor exposure data often contain outliers, observations that are inconsistent with the majority of the data. Both academic finance researchers and quantitative finance professionals are well aware of the occurrence of outliers in financial data, and seek to limit the influence of such observations in data analyses. Commonly used outlier mitigation techniques assume that it is sufficient to deal with outliers in each variable separately.

Building
Room
C-301

Exploring Rates and Patterns of Variability in Gene Conversion and Crossover in the Human Genome

Start Time
Speaker
Garrett Richard Hellenthal

Meiotic recombination is a biological process that shuffles our genetic material before we pass it along to our offspring. There are two known outcomes of recombination: crossover and gene conversion. Recently, fine-scale human crossover rates have been inferred with some success using statistical methodology applied to population data (i.e. genetic data on random samples of individuals from a population). However, reliable estimation of gene conversion rates has proven more difficult to come by.

Building
Room
C-301

Estimating coancestry among multiple individuals in populations

Start Time
Speaker
Christopher G. Glazner

Segments of genome inherited from a common ancestor by multiple individuals are said to be identical by descent (IBD). Dense genotyping platforms permit the detection of IBD segments less than 5 centiMorgans long, which arise due to coancestry on the order of dozens of generations ago. Generalizations of classical pedigree-based linkage methods use this inferred IBD and can be applied in situations where pedigree data is incomplete. We present a method for inferring IBD in groups of individuals without pedigrees.

Building
Room
C-301

Seeing the Trees Through the Forest: A Competition Model for Growth and Mortality

Start Time
Speaker
Hilary Mason Lyons

Advisor: Peter Guttorp Local competition between trees affects growth and mortality, from which emerges spatial patterns of surviving trees. Often, the patterns resulting from this unspecified process are treated as instances of spatial patterns and analyzed with point process methods. Alternatively, forest simulation models assume mechanistic processes and parameters to examine the effects of these assumptions on tree patterns over time, and assess sensitivity to changing conditions, such as climate.

Building
Room
C-301

Predictive Modeling of Cholera Outbreaks in Bangladesh

Start Time
Speaker
Amanda Allen

Advisors: Vladimir Minin and Ira Longini Despite seasonal cholera outbreaks in Bangladesh, little is known about the relationship between environmental conditions and cholera cases. We seek to develop a predictive model for cholera outbreaks in Bangladesh based on environmental predictors. To do this, we estimate the contribution of environmental variables, such as water depth and water temperature, to cholera outbreaks in the context of two different disease transmission models.

Building
Room
C-301

Modeling Competition in Forest Development

Start Time
Speaker
Hilary Mason Lyons

Analysis of the patterns of entities and their attributes in space is a common and useful endeavor in ecology. Often, the end of a statistical analysis is a general characterization of the observed pattern or series of patterns. However, a good description of the outcome may be somewhat dissatisfying to the practicing scientist or resources manager in that the mechanisms and processes that led to the outcomes remain unknown.

Building
Room
C-301

Testing for Differences between Least Squares and Robust Regression Estimates

Start Time
Speaker
Tatiana A Maravina

At the present time there is no well accepted test for comparing least squares and robust linear regression coefficient estimates. To fill this gap we propose and demonstrate the efficacy of two Wald-like statistical tests for the above purposes, using for robust regression the class of MM-estimators.

Building
Room
C-14

Classifying Immune Responses in Peptide Microarray Immunoassays

Start Time
Speaker
Gregory C Imholte

Advisor: Dr. Raphael Gottardo Peptide microarrays tiling immunogenic regions of pathogens (e.g. envelope proteins of a virus) have become an important high throughput tool for querying and mapping antibody binding. Antibodies play a key role in the immune system by preventing and controlling infection. Antibody binding locations provide crucial information for understanding natural infection and for deriving effective vaccines. In the context of vaccine development, the peptide microarray can reveal patterns of antibody response stimulated via vaccine treatment.

Building
Room
C-301

Pairwise Clustering by Random Walks

Start Time
Speaker
Marina Meila

In a similarity based clustering task, one defines a \"similarity function\" between pairs of points and then formulates a criterion (e.g. maximum intracluster similarity) that the clustering must optimize. The optimality criterion quantifies the intuitive notion that points in the same clusters should be similar while points in different clusters should be dissimilar. Most sensible criteria are NP hard to optimize. An alternative view that has been successful in recent years is represented by spectral methods, where clustering is based on the first few eigenvectors of a matrix.

Building
Room
C-401

Up-and-Down and the Percentile-Finding Problem

Start Time
Speaker
Assaf P Oron

A problem encountered across many fields in science, engineering and medicine, is finding a specific percentile of a binary-response threshold distribution (for example: finding the ED50 of a medication). Statisticians have designed two popular sequential solutions to this challenge: 'Up-and-Down' (U&D), a 1940's vintage method; and Bayesian designs - most prominently 'Continual Reassessment Method' (CRM, Quigley et al., 1990), a design tailored to Phase I clinical trials. U&D generates a random walk revolving around the target percentile.

Building
Room
C-301

Maximum-Likelihood Inference after Model Selection

Start Time
Speaker
Amit Nathan Meir

Standard statistical technique often fail in the presence of data-driven model selection, yielding inefficient estimators and hypothesis tests that fail to achieve nominal type-I error rates. In particular, the observed data is constrained to lie in a subset of the original sample space that is determined by the selected model. This often makes the post-selection likelihood of the observed data intractable and inference difficult. Recently, novel methodologies have been proposed for performing valid inference in selected models.

Building
Room
C-301

Nonparametric Estimation for Current Status Data with Competing Risks

Start Time
Speaker
Marloes H. Maathuis

We study the nonparametric maximum likelihood estimator (MLE) for current status data with competing risks. These data arise naturally in cross-sectional survival studies with several failure causes, and generalizations arise in HIV vaccine clinical trials. Until now, the asymptotic properties of the MLE have been largely unknown. We resolve this issue by proving consistency, the rate of convergence, and the limiting distribution of the MLE.

Building
Room
C-301

Hierarchical modelling of spatial structure of epidermal nerve fibers

Start Time
Speaker
Aila Sarkka

Epidermal nerve fiber (ENF) density and morphology are used to diagnose small fiber involvement in diabetic and other small fiber neuropathies. ENF density and summed length of ENFs per epidermal surface area are reduced in diabetic subjects. Furthermore, based on mainly visual inspection, it has been reported that ENFs of subjects with diabetic neuropathy seem to appear more clustered than ENFs of healthy subjects. Therefore, it is important to understand the spatial structure of ENFs in healthy and diseased subjects.

Building
Room
THO 125

Parametrizations of Discrete Graphical Models

Start Time
Speaker
Robin J. Evans

Advisor: Thomas Richardson Graphical models provide an intuitive way of representing conditional independence relations over multivariate distributions. We work with a very general class of graphs we dub Mixed Euphonious Graphs (MEGs), which include DAGs, undirected graphs and ancestral graphs as special cases. Markov properties and parametrizations of discrete distributions obeying the global Markov property for MEGs were found by Richardson (2003, 2009). We discuss this parametrization, and a Maximum Likelihood fitting algorithm which uses it.

Building
Room
C-301

Postulating Monotonicity in Bayesian Nonparametric Regression

Start Time
Speaker
Elja Arjas

It is often reasonable, by using earlier empirical evidence or theoretical understanding of the considered applied context, to assume that the regression surface corresponding to a response variable, as a function of the model covariates, is either monotonically increasing or monotonically decreasing, but then otherwise leave the form of such a function unspecified. In this talk we consider the practical implications of making such a postulate when applying variable dimensional Bayesian modeling, MCMC, and model averaging.

Building
Room
C-14

Markov Equivalence Classes for Bayesian Belief Networks

Start Time
Speaker
Steven B. Gillispie

Acyclic digraphs are used to represent the underlying relationships of some Bayesian belief networks, which are in turn used in expert systems and other representations of statistically interdependent items. But the set of such digraphs turns out to be too big and, instead, a smaller number of equivalence classes truly represent the set of possible networks. Until now, little has been known about the combinatorial properties of these classes, such as their asymptotic growth with number of vertices or the average class size.

Building
Room
C-401

Explicit Limit Results for Markov Chains and Other Markov Processes

Start Time
Speaker
Valerie Stefanov

The statistical literature abounds with limit results (central limit theorems, laws of large numbers and laws of iterated logarithm) for Markov chains, Markov renewal processes, and Markov additive processes. However, most of the general results are not applicable in practice because the limiting quantitites are not available in an explicit form, in general.

Building
Room
C-301

Maximum-Likelihood Inference after Model Selection

Start Time
Speaker
Amit Nathan Meir

Co-Advisors: Mathias Drton & Raphael Gottardo Standard statistical technique often fail in the presence of data-driven model selection, yielding inefficient estimators and hypothesis tests that fail to achieve nominal type-I error rates. In particular, the observed data is constrained to lie in a subset of the original sample space that is determined by the selected model. This often makes the post-selection likelihood of the observed data intractable and inference difficult. Recently, novel methodologies have been proposed for performing valid inference in selected models.

Building
Room
C-301

Bayesian Population Reconstruction: A Method for Estimating Age- and Sex-specific Vital Rates and Population Counts with Uncertainty from Fragmentary Data

Start Time
Speaker
Mark C Wheldon

Current methods for reconstructing human populations of the past by age and sex are deterministic or do not formally account for measurement error. I propose \\\"Bayesian reconsruction\\\", a method for simultaneously estimating age-specific population counts, fertility rates, mortality rates and net international migration flows from fragmentary data, that incorporates measurement error. Expert opinion is incorportated formally through informative priors. Inference is based on joint posterior probability distributions which yield fully probabilistic interval estimates.

Building
Room
C-301

Bayesian Nonparametric Inference of Population Trajectories with Gaussian Processes

Start Time
Speaker
Julia A. Palacios Roman

Advisor: Vladimir Minin Changes in population size influence genetic diversity of the population and, as a result, leave imprints in genomes of individuals in the population. We are interested in an inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences.

Building
Room
C-301

Bayesian Space-Time Smoothing Models for Small Area Estimation

Start Time
Speaker
Laina D. Mercer

Advisor: Jon Wakefield Area and time-specific estimates of disease rates, cause-specific mortality rates and other key health indicators are of great interest for health care and policy purposes. Such estimates provide the information needed to identify areas with increased risk, effectively allocate resources, and target interventions. A wide variety of data, such as vital statistics, complex surveys, demographic surveillance sites, and disease registries, are used for these purposes.

Building
Room
C-14A

Learning Transcriptional Networks from the Integration of ChIP-chip and Expression Data in a Nonparametric Model

Start Time
Speaker
Ahrim Youn

We have developed LeTICE, an algorithm for learning a transcriptional regulatory network from ChIP-chip location and expression data. The network is specified by a binary matrix of transcription factor – gene interactions which partitions the genes into a collection of modules (groups of genes regulated by the same TFs) and a background (a group of genes which do not belong to any module). We define a likelihood of a network given location and expression data and then search for the network optimizing the likelihood using numerical optimization.

Building
Room
C-301

Improving Serfling's Inequality for the Hypergeometric Distribution

Start Time
Speaker
Evan P. Greene

Advisor: Jon Wellner Abstract: We discuss a method for obtaining finite sample Gaussian bounds for the tail of the hypergeometric distribution. The method is based on Tusnády's approach (1975) to bounding the tail of symmetric binomial random variables. In this talk, we review Tusnády's result, and discuss how it can be adapted to and extended in the hypergeometric case.

Building
Room
C-301

Bayesian Hierarchical Curve Registration

Start Time
Speaker
Donatello Telesca

A number of different scientific fields ranging from biomedicine to economics, to molecular biology, generate functional data. The statistical analysis of a sample of curves, known as Functional Data Analysis (FDA), has as one of its goals explaining how variation in the functional outcome can be explained by some predictors. However, these curves tend to be misaligned, exhibiting variation not only in amplitude, but also in phase. Teasing apart these sources of variation is a central issue in FDA.

Building
Room
C-301

Statistical Approaches to Analyze Mass Spectrometry Data

Start Time
Speaker
Soyoung Ryu

Advisors: Vladimir Minin & David Goodlett

Proteomics attempts to understand biological functions of an organism through the lens of expressed proteins, basic building blocks of all living cells. Mass spectrometry is used in the field of shotgun proteomics to generate mass spectra that are in turn used to identify and quantify proteins in a given sample.

Building
Room
C-301

A General Approach to Nonparametric Monotone Function Estimation

Start Time
Speaker
Theodore Westling

For several important monotone parameters, such as the distribution function, monotone density function, and monotone regression function, sensible nonparametric estimators can be obtained by minimizing the empirical risk based on an appropriate loss function. For more complex monotone parameters, such as a monotone covariate-adjusted dose-response curve, or in the context of more complex data structures, this approach may not be possible and alternative approaches are needed. We discuss general strategies for monotone function estimation in two important settings.

Building
Room
C-301

Estimating Social Contact Networks to Improve Influenza Simulation Models

Start Time
Speaker
Gail E. Potter

Advisor: Mark Handcock Influenza pandemics pose a serious global health concern. The recent A (H1N1) influenza pandemic caused 18,500 lab-confirmed deaths, and mutation of the A (H5N1) \"avian\" influenza virus could also cause a pandemic with an estimated 60% case mortality rate in humans, requiring fast analysis of intervention and containment strategies. When a new influenza virus emerges with pandemic potential, stochastic simulation models are used to assess the effectiveness of different strategies.

Building
Room
C301

Finite Sampling Exponential Bounds with Applications to Two-Sample Kolmogorov-Smirnov Statistics

Start Time
Speaker
Evan P. Greene

Advisor: Jon Wellner In this talk, we discuss exponential tail inequalities for the sum in the context of sampling without replacement. Using an exponential inequality due to Serfling as the basis for investigation, we consider the special case of sampling from a finite population containing only 0s and 1s. This leads to considering exponential bounds for the Hypergeometric distribution.

Building
Room
C-14A

Semiparametric Copula Models for Diverse Types of Dependent Data

Start Time
Speaker
Xiaoyue Niu

In multivariate analysis, we are often interested in studying the dependence structure among diverse types of data, including continuous, ordinal, and non-ordered categorical data. One approach to analyze these data is using copula models. In this talk, I will discuss a method extending copula models to mixed continuous and ordinal data and study its asymptotic properties. Then I will introduce a new model incorporating copula models and model-based clustering ideas to deal with mixed continuous, ordinal and categorical data.

Building
Room
C-301

Ergodic Limit Laws for Stochastic Optimization Problems

Start Time
Speaker
Chris Jennison

Propp and Wilson's coupling from the past (CFTP) algorithm provides exact samples and, thus, an elegant alternative to convergence diagnostics for standard MCMC samplers. I shall explain how this method works and discuss some practicalities regarding its use in MCMC sampling. Unfortunately the CFTP technique is only applicable when the distribution to be sampled possesses certain special properties. We propose a way to use the method's basic idea more generally and demonstrate that our algorithm works well in some quite challenging applications.

Building
Room
C-301

Likelihood-Based Inference for Partially Observed Multi-Type Markov Branching Processes

Start Time
Speaker
Jason Q. Xu

Advisor - Vladimir Minin Abstract - Markov branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous modeling applications. Multi-type processes are necessary to model phenomena such as competition, predation, or infection, but often feature large or uncountable state spaces, rendering general CTMC techniques impractical. We present new methodology motivated by processes arising in molecular epidemiology, cellular differentiation, and infectious disease dynamics.

Building
Room
C-301

Wavelet Variance Analysis for Time Series and Random Fields

Start Time
Speaker
Debashis Mondal

Wavelets give rise to the concept of wavelet variance that decomposes the variance of a time series on a scale by scale basis and that has considerable appeal when physical phenomena are analyzed in terms of variations operating over a range of different scales. The wavelet variance has been applied to a variety of time series and is useful as an exploratory tool to identify important scales, to assess the exponent parameter of a power law process, to detect inhomogeneity and to estimate a time varying spectral density function.

Building
Room
C-301

Hammersley's Process with Sources and Sinks

Start Time
Speaker
Petrus Groeneboom

Hammersley (1972) initiated a very interesting "hydrodynamical" approach to the study of the behavior of the lengths of longest increasing subsequences of random permutations. In the nineties Aldous and Diaconis (1995) introduced a modified version of the interacting particle process, studied in Hammersley (1972), and used this modification in a proof of the fact that the length of a longest increasing subsequence of a (uniform) random permutation of length n, divided by sqrt{n}, converges in probability to 2.

Building
Room
C-301

Model-Based Penalized Inference

Start Time
Speaker
Maryclare C Griffin

It is well known that many penalized regression problems can be interpreted as estimating unknown regression coefficients having assumed a specific statistical model. This includes the lasso when tuning parameters are estimated from the marginal likelihood of the data, the Bayesian lasso, Gaussian random effects models, ridge regression, etc. In the first part, we consider estimating a mean matrix from a single noisy realization. We assume possibly sparse elementwise effects and use a lasso penalty.

Building
Room
C-301

Models and Inference for Network and Attribute Data

Start Time
Speaker
Bailey Kathryn Fosdick

Latent variable network models provide low-dimensional representations of relational patterns in terms of additive and multiplicative actor-specific effects. In this talk we discuss these models in two contexts. First, we extend this class of models to estimate and make inference on the dependencies between a set of network relations and actor-specific attributes. Approaches to this problem typically condition on either the relations or attributes and are unable to provide predictions simultaneously for missing attribute and network information.

Building
Room
C-301

Modeling Longitudinal Multivariate Data with Mixed Outcomes: Hierarchical Latent Trait and Individual-Level Mixture Models

Start Time
Speaker
Jonathan C. Gruhl

Advisor: Elena Erosheva I develop Bayesian hierarchical latent variable models for the study of longitudinal multivariate data. The latent variable models seek to represent multivariate data with a reduced number of dimensions while the hierarchical formulation enables the description of the latent structure evolution over time as well as factors associated with this evolution. Research on cognitive assessments and scientific interest in relating cognitive decline to neuroimaging results and biomarker information motivate these models.

Building
Room
C-301

TBD

Start Time
Speaker
Susan Shortreed

There will be a riveting introduction to social networks and the latent space model used for modeling networks. I will discuss the difficulties in estimating the parameters of this model by traditional methods and explain the estimator we came up with to deal with these issues.

Building
Room
C-301

Modeling Preferential Sampling Reduces Bias and Improves Precision When Estimating Effective Population Size Trajectories

Start Time
Speaker
Michael D. Karcher

Advisor: Vladimir Minin The field of phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from the population of interest. One way to accomplish this task is to formulate an observed sequence data likelihood by using a coalescent model for the sampled individuals’ genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from sequence data. These strategies also work when molecular sequences are sampled serially through time.

Building
Room
C-301

Estimation of Convex-Transformed Densities

Start Time
Speaker
Arseni V. Seregin

A convex-transformed density is a quasi-concave (or a quasi-convex) density which is a composition of monotone and convex functions. We consider a scale of such families of multivariate densities indexed by a parameter which is a monotone function. The exponential function corresponds to log-concave densities, while power functions correspond to heavier tailed densities or densities concentrated on the positive orthant.

Building
Room
C-301

Classification by Opinion-Changing Behavior: A Mixture Model Approach

Start Time
Speaker
Jennifer Hill

Popular theories in political science regarding opinion-changing behavior postulate existence of one or both of two broad categories of people: those who hold their opinions over time; and those that hold no solid opinion and, when asked to make a choice, do so seemingly at random. This study explores evidence for a third category: durable changers. This group of people will change their opinion in a rational, informed manner, after being exposed to new information.

Building
Room
C-301

Postprocessing of Precipitation Forecasts with an SPDE Based Spatio-temporal Model for Large Data

Start Time
Speaker
Fabio Sigrist

We introduce a hierarchical Bayesian model (HBM) for precipitation monitoring data that incorporates numerical weather prediction (NWP) model output at high spatial and temporal resolution and a physics-based stochastic partial differential equation (SPDE). The SPDE explicitly models phenomena such as advection and diffusion that occur in many natural processes. We approximate the solution of the SPDE in the spectral space using the method of eigenfunctions to reduce the dimensionality of the problem.

Building
Room
C301

Statistical inference using Kronecker structured covariance

Start Time
Speaker
Alexander Volfovsky

We consider the problem of testing and estimation of separable covariances for relational data sets in the context of the matrix-variate normal distribution. Relational data are often represented as a square matrix, the entries of which record the relationships between pairs of objects. Many statistical methods for the analysis of such data assume some degree of similarity or dependence between objects in terms of the way they relate to each other. However, formal tests for such dependence have not been developed.

Building
Room
C-301

TBD

Start Time
Speaker
Heiko Manfred Bailer

The Capital Asset Pricing Model (CAPM) is today\\\'s most important financial model for estimating cost of capital and asset allocation. Its centerpiece are variables, commonly called betas and alphas, estimated using ordinary least squares (OLS) regression. Since financial returns typically have an asymmetric and heavy-tailed distribution, OLS estimates can be severely biased. In this talk we will introduce robust regression estimates with zero bias in beta and low bias in alpha (even under asymmetric distributions) but 99% asymptotic efficiency at the Gaussian model.

Building
Room
C-301

Bayesian Modeling of International Migration

Start Time
Speaker
Jonathan J. Azose

Advisor: Adrian Raftery The future of international migration is a topic of great social and political importance, and yet international migration is hard to even estimate, let alone predict. The unreliability of point projections of migration indicates a need for better quantification of uncertainty in migration projections. We accomplish this quantification of uncertainty with a Bayesian hierarchical autoregressive model on net migration rates. In an initial model, we assume error terms are independent across countries.

Building
Room
C-301

Extensions of Latent Class Transition Models with Application to Chronic Disability Survey Data

Start Time
Speaker
Toby White

Latent class transition models (LCTMs) are used to study the movement of individuals among homogeneous subgroups through time. Traditional LCTMs assume a complete set of observations for each individual. However, many longitudinal surveys have a rolling enrollment design, with late entry and early exit. Thus, methodology is needed to account for all the possible times at which individuals can be observed.

Building
Room
C-301

Discovering Interactions In Multivariate Time Series

Start Time
Speaker
Alexander H. Tank

In large collections of multivariate time series it is of interest to determine interactions between each pair of time series. We study methods for inferring time series interactions in three domains: 1) conditional independencies between time series, 2) Granger and instantaneous causality estimation in subsampled and mixed frequency time series, and 3) Granger causality estimation in multivariate categorical data. First, we explore a Bayesian framework for inferring graphical models of time series.

Building
Room
C-301

Jump Estimation in Inverse Regression Models

Start Time
Speaker
Axel Munk

We provide an asymptotic theory for penalized least squares estimators of locally constant functions with finitely many jumps which are blurred by an operator and random noise. Differences to the direct case are highlighted, particularly, it turns out that a sqrt(n) rate of convergence for estimation of the jump locations is generic in the inverse case. Moreover, locations of jumps are jointly asymptotic normal, which allows to construct confidence regions for the graph of a function with a finite number of jumps.

Building
Room
C-301

Nonstationary Modeling Through Dimension Expansion

Start Time
Speaker
Luke Bornn

If atmospheric, agricultural, and other environmental systems share one underlying theme it is complex spatial structures, being influenced by such features as topography and weather. Ideally we might model these effects directly; however, information on the underlying causes is often not routinely available. Hence, when modeling environmental systems there exists a need for a class of spatial models which does not rely on the assumption of stationarity. In this talk, we propose a novel approach to modeling nonstationary spatial fields.

Building
Room
C-301

Allele-Sharing Methods for Linkage Detection Using Extended Pedigrees

Start Time
Speaker
Saonli Basu

Allele-sharing methods provide a robust approach to linkage detection for complex traits using pedigree data. Affected related individuals have increased probability of sharing genes identical-by-descent (IBD) at trait loci and hence also at linked marker loci at which they therefore show increased similarity over that predicted under Mendelian segregation. Relatives of discordant phenotype have decreased probability of sharing genes IBD at trait loci and hence have decreased similarity at linked markers.

Building
Room
C-301

Estimating the Treatment Effect of Non-Randomized Educational Interventions: The Case of Special Education

Start Time
Speaker
Roderick M. Theobald

A central goal of the education literature is to demonstrate that specific educational interventions have a treatment effect on student test performance. Researchers often have access to student test scores for students in the treatment and control groups both prior to and after the intervention, but usually must estimate the treatment effect from observational data in which the intervention has not been randomly assigned to units. This talk begins with a discussion of the assumptions that underlie common approaches to estimating a treatment effect with observational data.

Building
Room
C-301

TBD

Start Time
Speaker
J. McLean Sloughter

MURI week continues this Friday. I'll be talking about probabilistic weather forecasting using Bayesian Model Averaging, an altogether different approach than the probabilistic forecasting method described by Tilmann in seminar earlier this week. I'll be discussing my work on forecasting of wind and rain, and looking at a modification of the EM algorithm for mixed continuous/discrete distributions.

Building
Room
C-301

Hamiltonian Monte Carlo in Bayesian Empirical Likelihood Computation

Start Time
Speaker
Sanjay Chaudhuri

We consider Bayesian empirical likelihood estimation and develop an efficient Hamiltonian Monte Carlo method for sampling from the posterior distribution of the parameters of interest. The proposed method uses hitherto unknown properties of the gradient of the underlying log-empirical likelihood function. It is seen that these properties hold under minimal assumptions on the parameter space, prior density and the functions used in the estimating equations determining the empirical likelihood.

Building
Room
C-301

Bayesian Modeling For Multivariate Mixed Outcomes With Applications To Cognitive Testing Data

Start Time
Speaker
Jonathan C. Gruhl

This talk describes new multivariate regression and model-based clustering methods for statistical inference with multivariate mixed outcomes. We use the term mixed outcomes to refer to binary, ordered categorical, count, continuous and other ordered outcomes in combination. Such data structures are common in social, behavioral, and medical sciences. We develop two regression approaches, the semiparametric Bayesian latent variable model and the semiparametric reduced rank multivariate regression model, for mixed outcome data.

Building
Room
C-301

Methods for Estimation and Inference for High-Dimensional Models

Start Time
Speaker
Lina Lin

Advisor: Mathias Drton & Ali Shojaie

Modern statistical problems are increasingly high-dimensional, with the number of covariables p potentially vastly exceeding sample size N. Fortunately, significant progress has been made in developing rigorous statistical tools for tackling such problems, but these methods have primarily targeted prediction, point estimation, and or variable selection.

Building
Room
C-301

Restricted Covariance Priors with Applications in Spatial Statistics

Start Time
Speaker
Theresa R. Smith

We present a Bayesian model for area-level count data that uses Gaussian random effects with a novel type of G-Wishart prior on the inverse variance-covariance matrix. The usual G-Wishart prior restricts off-diagonal elements of the precision matrix to 0 according to the neighborhood structure of the study region. This preserves conditional independence of non-neighboring regions but is more flexible than the traditional intrinsic autoregression prior.

Building
Room
C301

Bayesian Inference for Exponential-family Random Graph Models for Social Networks

Start Time
Speaker
Ranran Wang

Exponential-family random graph model (ERGM) has been widely applied in the fields of social network analysis, genetics (e.g. protein interaction networks), information theory etc. Because of the intractability of the likelihood function, Markov Chain Monte-Carlo (MCMC) approximation is typically applied to obtain maximum likelihood estimators (Geyer and Thompson 1992). However, ERGMs still suffer from inferential degeneracy and computational deficiency. In this talk, we present the Bayesian inference to ERGM.

Building
Room
C-301

A New Goodness of Fit Test: The Reversed Berk-Jones Statistic

Start Time
Speaker
Leah R. Jager

In classical testing problems, we often use statistics based on the empirical distribution function to test whether or not the underlying distribution of the data is what we think it might be. Berk and Jones introduced such a statistic in 1979. I'll talk about a statistic which is related to theirs (called the reversed Berk-Jones statistic), and some of its properties. Along the way we'll chat about what exactly the empirical distribution function is, and why I think it's so cool. That is all.

Building
Room
C-301

Likelihood-Based Inference for Partially Observed Multi-type Branching Processes

Start Time
Speaker
Jason Q. Xu

Advisor: Vladimir Minin Branching processes are a class of continuous-time Markov chains (CTMCs) frequently used in stochastic modeling with ubiquitous applications. One-dimensional cases such as birth-death processes are well studied, but it is often necessary to model systems with more than one species --- bivariate or other multi-type processes are commonly used to model phenomena such as competition, predation, or infection.

Building
Room
C-301

John's Walk

Start Time
Speaker
Adam M. Gustafson

We present an affine-invariant random walk for drawing uniform random samples from a convex body for which the maximum volume inscribed ellipsoid, known as John's ellipsoid, may be computed. We consider a polytope where as a special case. Our algorithm makes steps using uniform sampling from the John's ellipsoid of the symmetrization of at the current point. We show that from a warm start, the random walk mixes in steps. This sampling algorithm thus offers improvement over the affine-invariant walk known as the Dikin Walk (which mixes in steps from a warm start) for applications in which .

Building
Room
C-14A

Whole-Genome Quantitative Trait Prediction and Heritability Mapping via an Infinite Allele Model

Start Time
Speaker
Serge Sverdlov

The paradox of missing heritability refers to the common finding that in complex genetic traits with high heritability as estimated by methods such as twin studies, only a small fraction of the population variance is explained by the few Single Nucleotide Polymorphism (SNP) markers which are found to be individually significantly associated with the trait. Human height, with heritability estimates as high as 80% largely unexplained by individual SNP’s, is the canonical example of such a trait.

Building
Room
C-301

Peptide Sequencing Using Tandem Mass Spectrometry

Start Time
Speaker
Qunhua Li

Tandem mass spectrometry has become a leading technology for protein identification. Much research has been done to automate the task of matching spectra to peptides.

In this study, we propose a probabilistic sequencing algorithm. It includes a probabilistic network to model the chemistry in the generation of theoretical spectrum, a pair hidden markov model to match theoretical spectrum and observed spectrum, and a probabilistic score function to rank the candidate sequences.

Building
Room
C-301

Gravimetric Anomaly Detection Using Compressed Sensing

Start Time
Speaker
Ryan D. Kappedal

We address the problem of identifying underground anomalies (e.g. holes) based on gravity measurements. This is theoretically well-studied and difficult problem. In all except a few special cases, the inverse problem has multiple solutions, and additional constraints are needed to regularize it. Our approach makes general assumptions about the shape of the anomaly that can be seen as sparsity assumptions. Then we adapt recently developed sparse reconstruction algorithms to bear on this problem.

Building
Room
C301

The Career Leap from Academia to Data Science

Start Time
Speaker
Praveen Kundurthy

The amount of data we generate as a global civilization is growing exponentially. What's more important however, is the fact that storing, accessing and analyzing data is getting cheaper and faster. Organizations all over the world have realized that data is a prized commodity, and many in the industry are scrambling to extract value from their complex data sets. For this endeavor, they need individuals with the right skills and experience, and the quantitative disciplines in Academia are a great source for such individuals. In this talk, I will briefly describe my journey from a Ph.D.

Building
Room
C-301

Geostatistical Model Averaging for Probabilistic Quantitative Precipitation Forecasting

Start Time
Speaker
William P. Kleiber

Advisor: Tilmann Gneiting Accurate weather forecasts benefit society in crucial functions, including agriculture, transportation, recreation, and basic human and infrastructural safety. Over the past two decades, ensembles of numerical weather prediction models have been developed, in which multiple estimates of the current state of the atmosphere are used to generate probabilistic forecasts for future weather events. However, ensemble systems are uncalibrated and biased, and thus need to be statistically postprocessed. Bayesian model averaging (BMA) is a preferred way of doing this.

Building
Room
C-301

Introduction to Model-Based Clustering

Start Time
Speaker
Nema Dean

I will talk briefly about how I got involved in research in Model-Based Clustering in my final year of undergrad (and subsequently here) and give a brief outline of research I did then. The main part of the talk will be about different extensions to the model-based clustering methodology that I\'m working on. I\'ll mainly be focusing on research on variable selection with model-based clustering but I\'ll also talk, if I have time, about ideas I\'ll be working on for the next year.

Building
Room
C-301

Adaptive Higher-order Spectral Estimators

Start Time
Speaker
David C. Gerard

Advisor: Peter Hoff Many applications involve estimation of a signal matrix from a noisy data matrix. In such cases, it has been observed that estimators that shrink or truncate the singular values of the data matrix perform well when the signal matrix has approximately low rank. In this talk, we generalize this approach to the estimation of a tensor of parameters from noisy tensor data. We develop new classes of estimators that shrink or threshold the mode-specific singular values from the higher-order singular value decomposition.

Building
Room
C-14

Manifold Learning Using Kernel Density Estimation and Local PCA

Start Time
Speaker
Kitty Mohammed

High-dimensional datasets often have lower-dimensional structure, which frequently takes the form of a manifold. There are many algorithms (e.g., Isomap) that are used in practice to fit manifolds and thus reduce the dimensionality of a given dataset. In our work, we consider the problem of recovering a d-dimensional submanifold M of R^n when provided with noiseless samples from M. Ideally, the estimate M_hat of M should be an actual manifold. Generally speaking, existing manifold learning algorithms do not meet these criteria.

Building
Room
C-14A

Statistical Methods in Medical Imaging: Application to Mammography

Start Time
Speaker
Larissa I. Stanberry

Medical professionals and researchers used a variety of imaging techniques in their clinical practice and scientific investigations. In this talk I will focus on Mammography which is used for breast examinations and routine breast cancer screening. While the mammographic images proved to be a useful non-invasive tool for clinical monitoring, the images often luck detail and clarity. For example, in addition to having limited spatial resolution, skin-air boundary of the imaged breast is often obscured. This boundary is, however, an important initial step in the breast density estimation.

Building
Room
C-14

MS Thesis Presentation: A resampling approach to clustering with confidence

Start Time
Speaker
Yuan Chiam

We propose a method for estimating the number of groups in a data set. Our method is an extension of Generalized Single Linkage clustering (GSL) (Stuetzle and Nugent 2010), a nonparametric clustering method based on the premise that groups in the data correspond to modes of the underlying data density. GSL starts with a nonparametric density estimate. It recursively splits the data into high density regions separated by valleys. The leaves of the resulting cluster tree correspond to modes of the density estimate.

Building
Room
C-301

Recovery of Item Rankings Under Nonnormal Fitting Distributions in MML Parameter Estimation

Start Time
Speaker
David Edward Haldors Dailey

In a simulation study, data are generated under a variety of conditions with respect to underlying ability distribution, test length, and sample size. Item parameter estimates are obtained under two conditions: in one, the assumed ability distribution matches the underlying ability distribution; in the other, it does not. The item parameter estimates from the matching condition are compared to those from the nonmatching condition to determine the effect on the recovery of parameter estimates and item rankings.

Building
Room
C-14

Survival Analysis by Threshold Regression with Time-Dependent Covariates

Start Time

A natural approach to survival analysis in many settings is to model the subject’s “health” status as a latent stochastic process, where the terminal event is represented by the first time that the process crosses a threshold. “Threshold regression” models the covariate effects on the latent process. Much of the literature on threshold regression assumes that the process is one-dimensional Wiener, where crossing times have a tractable inverse Gaussian distribution but where the process characteristics are fixed at baseline.

Building
Room
C-14A

Factor Models with Non-Normality: Robust, Skewed Distribution MLE and Bayes Estimation

Start Time
Speaker
Tatiana A Maravina

Advisor: R. Douglas Martin The literature on use of robust estimates, skewed distribution MLE’s and non-normal distribution hierarchical Bayes models for multi-factor models in finance is surprisingly thin, and limited for the most part to single factor models (SFM’s). The ultimate goal of our research is the study of the relative merits of robust versus non-normal MLE estimation of multi-factor models and the use of hierarchical Bayes modeling of multi-factor models using skewed fat-tailed distributions.

Building
Room
C-301

Learning the "Epitome" of an Image

Start Time
Speaker
Brendan J. Frey

I will describe a new model of image data that we call the "epitome". The epitome of an image is its miniature, condensed version containing the essence of the textural and shape properties of the image. As opposed to previously used simple image models, such as templates or basis functions, the size of the epitome is considerably smaller than the size of the image or object it represents, but the epitome still contains most constitutive elements needed to reconstruct the image.

Building
Room
C-301

Combining Probability Forecasts

Start Time
Speaker
Roopesh Ranjan

We propose a method for combining probability forecasts from different sources. The commonly used method of linearly combining probability forecasts has limitations, in that a weighted combination of distinct calibrated forecasts is necessarily uncalibrated. In view of this, we propose a recalibration method. We illustrate our findings with simulation examples and a case study on operational probability of precipitation forecasts.

Building
Room
C-301

Algorithms and Software for the Automated Identification of Minerals Using Field Spectra or Hyperspectral Imagery

Start Time
Speaker
Mark Berman

Over the last few years, the speaker (and collaborators Leanne Bischof and Jon Huntington) have been developing fast and sophisticated algorithms and software for identifying pure minerals and mixtures of minerals from shortwave infrared spectra. The software, called The Spectral Assistant (TSA), has been designed to be used with a particular FIELD-PORTABLE spectrometer, the PIMA-II, which is about the size of a shoe box and can be used by geologists collecting samples in the field.

Building
Room
C-301

Improved estimation of bilateral migration flows

Start Time
Speaker
Jonathan J. Azose

I propose a method for estimating migration flows between all pairs of countries, including breakdowns by place of birth. My estimator is a pseudo-Bayes estimator which smooths a set of state-of-the-art estimates of migration flows towards a simpler estimate which contains fewer structural zeroes. The smoothing process provides a natural way to bypass the state-of-the-art estimator's unrealistic assumption that the number of global migrants is as small as possible.

Building
Room
C-301

Statistical Methodology for Longitudinal Social Network Data

Start Time
Speaker
Anton H. Westveld

Social interaction data are data that are generated from the interaction or relationship between two or more actors, thus the observational units are pairs, trios, etc. of actors. This type of data are common in all fields of social science (e.g. political science, sociology, anthropology, and economics) for the interaction of actors is a key element in social science theory.

Building
Room
C-301

Modeling Heterogeneity Within and Between Arrays

Start Time
Speaker
Bailey Kathryn Fosdick

Data that can be represented in the form of an array is present in many of the social and biological sciences. In this talk we address two statistical problems concerning these data. The first problem is modeling the heterogeneity along the dimensions of an array. Previously developed models are either non-stochastic and difficult to interpret, or require a large number of parameters prohibiting likelihood based inference for some arrays.

Building
Room
C-301

Degeneracy, Duration, and Co-evolution: Extensions of Exponential Random Graph Model (ERGM) for Social Network

Start Time
Speaker
Ke Li

We will address three aspects of statistical methodology for Exponential family Random Graph Models (ERGMs) in the context of applications to social network analysis. We start by addressing the topic of degeneracy in ERGMs. This is a problem often misunderstood to characterize the entire family of ERGMs, but is properly understood as a more limited issue of model misspecification.

Building
Room
C-14A

Nonparametric Estimation of the Bivariate Survivor Function

Start Time
Speaker
Albert Y. Kim

Correlated failure time data arise often in many application areas. For example, in genetic epidemiology study, the disease occurrence times of pairs of family members are often correlated and the degree of correlation may provide important leads in respect to disease etiology. Univariate failure time data methods are well established, including Kaplan-Meier method, censored data rank test and Cox regression method. However, the standard tools for multivariate failure data analysis data are not available yet.

Building
Room
C-301

The Likelihood Pivot: Performing Inference with Confidence

Start Time
Speaker
James Warren Harmon

Advisor: Peter Hoff Maximum likelihood estimation is a popular method of statistical inference in part due to its efficiency. Unfortunately, much of the efficiency is lost when the model has been misspecified. To account for possible model misspecification, the sandwich estimate of variance can be used with MLE inference to generate asymptotically correct confidence intervals, but these intervals typically perform poorly at small sample sizes. In this talk, we present a pivot-based method that performs better than the sandwich and its adjustments at small sample sizes.

Building
Room
C-301

Statistical Methods for Analyzing Incomplete Financial Data with Heavy Tails

Start Time
Speaker
Yindeng Jiang

A common problem with financial historical data is that they often have unequal lengths of histories. Examples include country market indices, currency rates and hedge fund returns histories. Practitioners often deal with such issues by truncating all the series so that the remaining data have the same length, which is apparently not an ideal solution. We discuss existing statistical methods that utilize the full data set, such as maximum likelihood estimation and multiple imputation.

Building
Room
C-301

Parameter Priors for Directed Acyclic Graphical Models and the Characterization of Several Probability Distributions

Start Time
Speaker
Dan Geiger

Multivariate Analysis & Graphical Models of Association (MAGMA 4) Workshop We develop simple methods for constructing parameter priors for model choice among Directed Acyclic Graphical (DAG) models. In particular, we introduce several assumptions that permit the construction of parameter priors for a large number of DAG models from a small set of assessments. We then present a method for directly computing the marginal likelihood of every DAG model given a random sample with no missing observations.

Building
Room
C-301

Likelihood-based haplotype frequency modeling using variable-order Markov chains

Start Time
Speaker
Aaron Baraff

The localized haplotype-cluster model uses variable-order Markov chains to create an empirical model for haplotype probabilities that adapts to the changing structure of linkage disequilibrium (LD) across the genome. By clustering haplotypes based on the Markov property, the model is able to take advantage of conditional independencies to improve estimates of haplotype frequencies while still respecting the dependencies induced by LD.

Building
Room
C-301

Bayesian Hierarchical Self-Modeling Warping Regression with Application to Network Inferences

Start Time
Speaker
Donatello Telesca

Functional data often exhibit a common shape but also variations in amplitude and phase across curves. The analysis often proceed by synchronization of the data through curve registration. We propose a Bayesian Hierarchical model for curve registration. Our model provides a formal account of amplitude and phase variability while borrowing strength from the data across curves in the estimation of the model parameters.

Building
Room
C-301

Robust Bayesian Analysis of Gene Expression Microarray Data

Start Time
Speaker
Raphael Gottardo

Microarrays are part of a new class of biotechnologies that can be used to measure expression levels (DNA or RNA abundance) for thousands of genes at a time. This new technology is being applied increasingly in biological and medical research to address a wide range of problems, such as the classification of tumors or the study of host responses to bacterial infections. DNA microarray experiments raise numerous statistical questions in fields as diverse as image analysis, experimental design, hypothesis testing, cluster analysis, etc.

Building
Room
C-301

Bayesian Spatial and Temporal Methods for Public Health Data

Start Time
Speaker
Theresa R. Smith

Advisors: Adrian Dobra and Jon Wakefield Understanding the relationships between disease incidence and risk factors such as demographic characteristics, life style factors, and environmental contaminants is a central goal in public health and epidemiology. Often outcomes and risk factors are measured at specific locations or at particular times. We present flexible Bayesian models for spatial and temporal data to address important public health questions in two examples. In the first example, we consider low birthweight and preterm birth along with three risk factors in North Carolina.

Building
Room
C-301

Nonparametric Estimation of the Bivariate Survivor Function

Start Time
Speaker
Lei Xu

Correlated failure time data arise often in many application areas. For example, in genetic epidemiology study, the disease occurrence times of pairs of family members are often correlated and the degree of correlation may provide important leads in respect to disease etiology. Univariate failure time data methods are well established, including Kaplan-Meier method, censored data rank test and Cox regression method. However, the standard tools for multivariate failure data analysis data are not available yet.

Building
Room
C-301

Directed Markov Point Processes

Start Time
Speaker
Gopalan Nair

Spatial Point process are often modeled as Markov fields, and inference for such models are sometimes either inefficient or computationally intensive due to difficulties in evaluating the normalizing constant. Simulation study for such process is hard. We exploit the partial order in the plane and introduce a class of Markov point processes known as \"Directed Markov Point Processes\" and investigate their properties. This Markov structure enables to study some of the well known spatial processes in detail.

Building
Room
C-301

Scalable Methods for Inference of Multiple IBD

Start Time
Speaker
Fiona L. Grimson

Advisor: Elizabeth Thompson A major topic in statistical genetics is discovering the locations of genes contributing to complex traits through linkage analysis. The likelihood of a genetic marker controlling the expression of the trait is calculated using estimated identity-by-descent (IBD) graphs, which indicate whether copies of the marker shared among individuals are inherited from a common ancestor. Methods for estimating IBD graphs either use pedigree or population relationships between the individuals, and do not scale to a large number of individuals.

Building
Room
C-301

Nonparametric Estimation of Multivariate Monotone Densities

Start Time
Speaker
Marios G. Pavlides

I will discuss the most important of results obtained along the direction of nonparametric estimation of two multivariate families of densities that exhibit monotonicity constraints, and which can otherwise be characterized as certain mixtures models. Discussion will emphasize on chracterizations of the estimators, their strong consistency and we will embark on discussing rates of convergence of these estimators, both in the global and the local sense.

Building
Room
C-301

Inference of Identity by Descent for Linkage Analysis

Start Time
Speaker
Fiona L. Grimson

Advisor: Professor Elizabeth Thompson Inference of identity by descent for linkage analysis Identity by descent (IBD) describes the pattern of shared inheritance of DNA among individuals. Two or more copies of DNA are identical by descent if they are inherited from the same common ancestor. IBD underlies the genetic similarity between individuals and thus similarity in observed genetic traits. In a family study of a genetic disease, estimated IBD among individuals in the family is used to identify potential locations of the gene that causes the disease.

Building
Room
C-301

Algorithms for Estimating the Cluster Tree of a Density

Start Time
Speaker
Rebecca Ann Nugent

The goal of clustering is to identify distinct groups in a data set and assign a group label to each observation. To cast clustering as a statistical problem, we regard the data as an iid sample from some unknown probability density p. We adopt the premise that groups correspond to modes of the density. Our goal then is to find the modes and assign each observation to the \"domain of attraction\" of a mode. We do this by estimating the cluster tree of the density, a representation of the hierarchical structure of its level sets.

Building
Room
C-301

Analyzing Time Series Data for Endemic Cholera in Bangladesh with Mechanistic Models of Infectious Disease Dynamics

Start Time
Speaker
Amanda Allen

Despite seasonal cholera outbreaks in Bangladesh, little is known
about the relationship between environmental conditions and cholera
cases. We seek to develop a predictive model for cholera outbreaks
in Bangladesh based on environmental predictors. To do this, we must
estimate the environmental parameters in the context of a disease
transmission model. We develop a method to simultaneously estimate
the transmission parameters and the environmental parameters in a
Susceptible-Infectious-Recovered-Susceptible (SIRS) model. The

Building
Room
C-301

Learning in Spectral Clustering

Start Time
Speaker
Susan Shortreed

Spectral segmentation is a technique used to group data based on pairwise similarities. A similarity matrix is used as input into a spectral clustering algorithm and a clustering over the data is output. The clustering criterion is such that similar points are put in the same cluster and dissimilar points are put in different clusters. Generally, this similarity matrix is assumed known, while in reality this matrix is usually constructed by hand, a very time consuming process.

Building
Room
C-301

Factor Model Monte Carlo Methods for General Fund-of-Funds Portfolio Management

Start Time
Speaker
Yindeng Jiang

The general Fund-of-Funds (GFoF) class of investment organizations includes fund-of-hedge funds (FoHF), family offices, endowments, pension plans and asset management companies. GFoF portfolios are characterized by two important types of returns problems among others. The first is that the returns histories of the portfolio assets are unequal, sometimes quite short and often contain multiple frequencies, resulting in structured missing data problems. The second is that the returns have fat-tailed and skewed distributions to varying degrees.

Building
Room
C-301

Parameter Identification and Assessment of Independence in Multivariate Statistical Modeling

Start Time
Speaker
Luca Weihs

Linear (causal) relationships between random variables can be conveniently encoded using a mixed graph (a graph with both directed and bidirected edges) where a directed edge implies a direct linear effect and a bidirected edge captures the existence of unobserved confounding. Even when there is a known a mixed graph that accurately reflects the data generating mechanism, that is, all causal relationships are known and linear, confounding can make it impossible to infer parameters of interest. More concretely, many mixed graphs have (generically) unidentifiable parameters.

Building
Room
C-301

Separable covariance testing and estimating for sociomatrices

Start Time
Speaker
Alexander Volfovsky

We consider the problem of testing and estimating separable covariances for relational data sets. We propose to model these data as matrix normal distributions with separate row and column covariance matrices. The existing literature on testing and estimation in the context of a matrix normal distribution requires multiple observations of the matrix, which rarely occurs for relational data sets.

Building
Room
C-301

Population Genetic Variation: A Computationally Tractable Model for Large Samples Typed at Many Loci

Start Time
Speaker
Paul Scheet

Haplotypes are specific combinations of alleles on the same chromosome, and various methods exist for the analysis of haplotype data from unrelated individuals. However, humans are diploid and studies of genetic variation might consist of unphased genotype data, where an unordered pair of alleles is observed at each locus. There is a coming need for less-computationally intensive models that may be directly applied to unphased genotype data from thousands of individuals at thousands of loci. In this talk, we present such a model for genetic variation.

Building
Room
C-301

Clustering with Confidence

Start Time
Speaker
Rebecca Ann Nugent

One of the fundamental goals of nonparametric cluster analysis is to estimate the cluster tree of a density. I will define and illustrate the cluster tree and describe a graph-based procedure for its estimation. The cluster tree will usually have spurious leaves due to variability in the density estimate. I will introduce a bootstrap-based method for eliminating spurious leaves and “clustering with confidence”.

Building
Room
C-301

Geostatistical Model Averaging

Start Time
Speaker
William P. Kleiber

Probabilistic weather forecasting is becoming an increasingly important and active area of research. Most current statistical post-processing techniques account for forecast bias and predictive variance without regard to forecast location. We will discuss a technique that adjusts bias and predictive variance locally, called geostatistical model averaging (GMA). In particular, GMA allows the parameters of the predictive distribution to vary over the model grid.

Building
Room
C-301

Bayesian Methods for Inferring Gene Regulatory Networks

Start Time
Speaker
William C. Young

Advisor: Adrian Raftery Gene regulatory networks are an important piece in understanding the functioning of living cells. As more and more gene expression data is becoming available, researchers need fast, reliable techniques for inferring these networks. I have developed ScanBMA, a fast Bayesian model averaging algorithm, used to infer networks from time-series data. I have also developed Model-based Clustering with Data Correction (MCDC), a method for automatically detecting and correcting errors that systematically affect some but not all data.

Building
Room
C-14A

Probabilistic Wind Forecasting Using Bayesian Model Averaging

Start Time
Speaker
J. McLean Sloughter

Bayesian model averaging has been shown to be a useful method for developing probabilistic weather forecasts for quantities (such as temperature) that can be represented by univariate normal distributions. This talk will discuss how these methods can be extended to other distributions, using wind forecasting as an example.

Building
Room
C-301

Graphical Markov Models for Partially Observed Data Generating Mechanisms

Start Time
Speaker
Thomas S. Richardson

Multivariate Analysis & Graphical Models of Association (MAGMA 4) Workshop Graphical Markov models represent statistical dependencies by combining two simple yet powerful mathematical concepts: graphs and conditional independence. A graphical Markov model is constructed by specifying local dependencies for each node of the graph in terms of its immediate neighbors, yet can represent a highly varied and complex system of multivariate dependencies by means of the global structure of the graph.

Building
Room
C-301

A Sharp Multiplier Inequality with Applications to Heavy-Tailed Regression Problems

Start Time
Speaker
Roy Han

Advisor:Professor Jon A. Wellner We develop a sharp multiplier inequality used to study the size of the multiplier empirical process $(\sum_{i=1}^n \xi_i f(X_i))_{f \in \mathcal{F}}$, where $\xi_i$'s and $\mathcal{F}$ are multipliers and an indexing function class respectively. We show that in general the size of the suprema of the multiplier empirical process is determined jointly by the growth order of the corresponding empirical process, and the worst size of the maxima of the multipliers.

Building
Room
C-301

Learning and Manifolds: Leveraging the Intrinsic Geometry

Start Time
Speaker
Dominique Perrault-Joncas

We explore and exploit the use of differential operators on manifolds - the Laplace-Beltrami operator in particular - in learning tasks. In particular, we are interested in uncovering the geometric structure of data(unsupervised learning) and in exploiting information contained in unlabeled data for regression and classification tasks (semi-supervised learning). First, building on the Laplacian Eigenmap and Diffusion Maps framework, we propose a new paradigm that offers a guarantee, under reasonable assumptions, that any manifold learning algorithm will preserve the geometry of a data set.

Building
Room
C-301

Bayesian Modeling of Survey Data in Space and Time

Start Time
Speaker
Cici Xi Chen Bauer

Advisor: Jon Wakefield Public health data are frequently obtained from surveys, which often have complex design sampling frames. It is crucial that analyses account for the latter to give appropriate inference. We describe two scenarios, with both having important spatial components. The first example is motivated by Behavioral Risk Factor Surveillance System (BRFSS) data. Empirical Bayes and Bayes hierarchical models for small area estimation have been used extensively for surveys like BRFSS.

Building
Room
C-301