We revisit the problem of adaptive estimation of the center of symmetry of an unknown symmetric distribution with an additional shape-constraint of log-concavity. This problem was investigated by early authors like Stone (1975), Van Eden (1970), Sacks (1975) who constructed adaptive estimators which depend on tuning parameters. An additional assumption of log-concavity can help us construct simpler estimates which can be efficiently computed without using tuning parameters. To estimate the center of symmetry we consider truncated one-step estimators.
We will consider a new class of probabilistic upper bounds on generalization error of complex classifiers that are \"combinations\" of simpler classifiers from a base class of functions. Such combinations can be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The resulting combined classifiers often have a large classification margin. The bounds on the generalization error are expressed in such cases in terms of the empirical distribution of the margins of the combined classifier.
Graphical models have become an important tool for analyzing multivariate data. While originally the interpretation of graphical models has been restricted to conditional independences between variables there has recently been growing interest in graphical models as a general framework for causal modelling and inference in experimental and observational studies. In this talk we discuss two approaches for the identification of cause-effect relationships from multivariate time series data. Both approaches exploit the fact that an effect cannot precede its cause in time for causal inference.
Multidimensional scaling is widely used to handle data which consist of dissimilarity measures between pairs of objects or people, and consists of estimating an object configuration in Euclidean space such that the estimated distances are related to the dissimilarities. Problems of this kind are pervasive in psychology and social science, and have arisen recently in areas such as document clustering, classification of Web sites, gene expression data, and datamining.
Advisor: Adrian Raftery
Advisor: Elizabeth Thompson
This work deals with three areas of network modeling. First, in the area of latent space modeling of social networks, it develops and extends latent cluster social network models by adding random effects and providing efficient algorithms for fitting these models. Second, it explores properties of ERGM and ERGM-based models under changing network size, and proposes a way of addressing the problems that arise.
Advisor: Adrian Raftery
We develop a framework for the modeling of joint distributions of high-dimensional data that is robust to a variety of data types and modeling paradigms. Central to our considerations are the issues of structural learning and posterior parameter estimation. By building our framework from Gaussian Graphical Models (GGMs) we are able to separate the learning and estimation problems, thereby proposing a methodology possessing desirable computational features.
Advisor: Jon Wellner
Lattice Conditional Independence Models for Missing Observations in Categorical Data, in Continuous Data, and for Seemingly Unrelated Regressions
Advisor: Michael Perlman
Joint Computational Finance and Optimization Seminar We consider a new approach to pricing options in incomplete markets. The algorithm replicates an option by a portfolio consisting of a stock and a bond. It simultaneously calculates prices of options with all strikes. We apply a linear regression framework with constraints which can accommodate various assumptions on stochastic processes of the underlying security. We can directly calibrate the model with historical prices of the underlying security using assumptions on the class of replication policies.
Advisors: Peter Guttorp & Don Percival
We formulate nonparametric density estimation as a constrained maximum likelihood problem whose constraints model any prior information available about the density. This technique will be used as a vehicle to illustrate the importance of including non-data information in the formulation of an estimation problem. For example, this non-data information may take the form of bounds on moments or specification of support, shape or smoothness.
Two of the most successful approaches in image restoration are mathematical morphology and Bayesian image restoration. They arise from different philosophies and are formulated very differently. Mathematical morphology involves a basic set of elementary operators that are usually combined to perform non-linear filtering of noisy images. The Bayesian approach involves finding the best interpretation of the images assuming a probabilistic model. Our aim is to investigate the possible relationships between these two approaches. The interest of such a study is twofold.
A Bayesian approach to the classification problem is proposed in which random partitions play a central role. It is argued that the partitioning approach has the capacity to take advantage of a variety of large-scale spatial structures, if they are present in the unknown regression function $f_0$. An idealized one-dimensional problem is considered in detail. The proposed nonparametric prior is found to provide a consistent estimate of the regression function in the $\\L^p$ topology, for any $1 \\leq p < \\infty$, and for arbitrary measurable $f_0:[0,1] \\rightarrow [0,1]$.
Advisor: Peter Guttorp We consider a special case of the two-dimensional stochastic n-compartment or stepping-stone model. This class of models represents a special type of Markov population process in which the state of the process a given time t is represented by (X1(t),X2(t)) = (X11(t), ... ,X1n(t);X21(t), ... , X2n(t)). In our example, all compartments except the nth are assumed completely unobservable, while in the nth we are able to obtain only a sample of each dimension at discrete time intervals. The first compartment is a hidden linear birth-emigration process.
Extended linear models form a very general framework for statistical modeling. Many practically important contexts fit into this framework, including regression, logistic or Poisson regression, density estimation, spectral density estimation, and conditional density estimation. Moreover, hazard regression, proportional hazard regression, marked counting process regression, and diffusion processes with or without jumps, all perhaps with time-dependent covariates, also fit into this framework.
Lucien Le Cam spent most of his career at the University of California, Berkeley. He created and developed large parts of the current large sample theory of statistics including contiguity theory, approximation of experiments, rates of Poisson approximation, tightness and basic theory for empirical processes, Poissonization inequalitites, preservation of local asymptotic normality under information loss, and methods for establishing rates of convergence in nonparametric problems in terms of measures of metric dimension investigated by Kolmogorov.
When someone breaks a window or another type of glass tiny fragments of glass are transferred onto their clothing. If the breaking of this window or other related events is a criminal offence, then these fragments become evidence. The physical properties of the recovered glass fragments can be matched with those of the putative source. However other processes are at work, which affect the quantity and strength of the evidence.
Many large-scale environmental data bases have been produced in recent years for the purpose of knowledge discovery related to processes such as greenhouse gas cycling, large scale hydrology etc. These data bases typically extend over a period of 20 to 50 years and over large spatial domains, such as continents, hemispheres, or even the entire global terrestrial domain. The possible existence of (temporal) trends is one of the primary topics of interest to environmental scientists.
Patterns of inheritance of genes on pedigrees underlie similarities among relatives, and hence approaches to the analysis of genetic data observed on related individuals. With modern genetic technology, data are often available for large numbers of genetic loci, sometimes on large sets of interrelated individuals. The space of underlying inheritance patterns consistent with the data is then not only huge, but also tightly constrained by the laws of genetics.