Nonparametric & semi-parametric statistics

With flexible, distribution-free approaches to data modelling, our methods can be applied across a wide range of applications, including shape evolution, medical image analysis, survival analysis, and weather patterns.

Staff

Postgraduate research students

Refine By

Nonparametric and Semi-parametric Statistics - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Causal inference in noisy social networks (PhD)

Supervisors: Vanessa McNealis
Relevant research groups: Nonparametric and Semi-parametric StatisticsBiostatistics, Epidemiology and Health Applications, Social and Urban Studies

One core task of science is causal inference, yet distinguishing causality from spurious associations in observational data can be challenging. Statistical causal inference provides a framework to define causal effects, specify assumptions for identifying causal effects, and assess sensitivity of causal estimators to these assumptions.

Recent interest has focused on causal inference under interference (or spillover), where one individual’s treatment affects the outcomes of others. Social network data are particularly valuable for this purpose, as they offer information about connections between individuals, revealing potential pathways for interference. For instance, in the National Longitudinal Study of Adolescent Health (Add Health), peer influences among adolescents provide an ideal case for studying spillover, especially as they relate to behavioural and academic outcomes. However, one challenge for Add Health is the very high level of missing edge variable data and censoring present, posing challenges since many methods for evaluating spillover effects assume fully observed networks.

This PhD will develop statistical methods for causal inference under network interference with noise, considering the following issues/approaches:

  • Bias characterization in the presence of missing or uncertain edge information
  • Semi-parametric inference
  • Propensity score methods
  • Multiple imputation for network data

A good knowledge of methods for survey sampling and regression is essential, familiarity with causal inference, statistical methods for coarse data, and semi-parametric inference would be an advantage.

Analysis of spatially correlated functional data objects (PhD)

Supervisors: Surajit Ray
Relevant research groups: Modelling in Space and TimeComputational StatisticsNonparametric and Semi-parametric StatisticsImaging, Image Processing and Image Analysis

Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which the student will work on:

  • robust statistical models for incorporating temporal and spatial dependencies in functional data
  • developing reliable prediction and interpolation techniques for dependent functional data
  • developing inferential framework for testing hypotheses related to simplified dependent structures
  • analysing sparsely observed functional data by borrowing information from neighbours
  • visualisation of data summaries associated with dependent functional data
  • Clustering of functional data

Modality of mixtures of distributions (PhD)

Supervisors: Surajit Ray
Relevant research groups: Nonparametric and Semi-parametric StatisticsApplied Probability and Stochastic ProcessesStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modeling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. In this project, we wish to explore the modality of non-normal distributions and their application to real life problems.

Seminars

Regular seminars relevant to the group are held as part of the Statistics seminar series. The seminars cover various aspects across the AI3 initiative and usually span multiple groups. You can find more information on the Statistics seminar series page, where you can also subscribe to the seminar series calendar.

Nonparametric statistics is an important statistical idea that allows one to simultaneously estimate and model the underlying structure of data without being restricted by distributional assumptions about the nature of the function. Semiparametric statistics aims to balance the flexibility of nonparametric models with the simplicity of statistical procedures under a parametric framework, by using components of each.

The group develops a range of methodology, including functional data analysis, Gaussian processes, semiparametric regression models, and multilevel modelling. These have been applied across a spectrum of different application areas, including the evolution of shapes, medical image analysis, survival analysis, and extremes in weather patterns.