Statistics thesis topics

We provide a dynamic and encouraging environment for postgraduate research in Statistics. Our focus on innovative and impactful research promotes the exploration of ideas that significantly enhance understanding across various disciplines, from theoretical foundations to real-world applications. Students are encouraged to engage with complex statistical challenges, fostering skills that are essential for addressing contemporary issues in science, industry, and beyond.

Research Groups and Collaborations

Our department hosts various research groups that foster collaboration and interdisciplinary work. Collaborations with industry partners and other academic institutions also enhance the relevance and application of our research.

Sample thesis topics

The following are sample topics available for prospective postgraduate research students in our department. These examples showcase the diversity of research areas you can explore, but they are not an exhaustive list. Most supervisors are eager to discuss potential projects that may not be featured here. Funded projects come with specific project funding, while financial support for other research initiatives is typically awarded on a competitive basis.

Centres for Doctoral Training (CDTs)

ExaGEO

ExaGEO (Exascale computing for Earth, Environmental, and Sustainability Solutions) train the next generation of Earth and environmental scientists to harness the power of exascale computing. The following projects are currently available via ExaGEO, and details can be found on the website about how apply.

Detecting hotspots of water pollution in complex constrained domains and networks (PhD)

Supervisors: Mu Niu, Craig Wilkie, Cathy Yi-Hsuan Chen (Business School, UofG)Michael Tso (Lancaster University)
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability

Technological developments with smart sensors are changing the way that the environment is monitored.  Many such smart systems are under development, with small, energy efficient, mobile sensors being trialled.  Such systems offer opportunities to change how we monitor the environment, but this requires additional statistical development in the optimisation of the location of the sensors.

The aim of this project is to develop a mathematical and computational inferential framework to identify optimal sensor deployment locations within complex, constrained domains and networks for improved water contamination detection. Methods for estimating covariance functions in such domains rely on computationally intensive diffusion process simulations, limiting their application to relatively simple domains and small-scale datasets. To address this challenge, the project will employ accelerated computing paradigms with highly parallelized GPUs to enhance simulation efficiency. The framework will also address regression, classification, and optimization problems on latent manifolds embedded in high-dimensional spaces, such as image clouds (e.g., remote sensing satellite images), which are crucial for sensor deployment and performance evaluation. As the project progresses, particularly in the image cloud case, the computational demands will intensify, requiring advanced GPU resources or exascale computing to ensure scalability, efficiency, and performance.

Developing GPU-accelerated digital twins of ecological systems for population monitoring and scenario analyses (PhD)

Supervisors: Colin Torney, Juan Morales (BOHVM, UoG), Rachel McCrea (Lancaster University), Tiffany Vlaar, Dirk Husmeier
Relevant research groups: 
Machine Learning and AI, Emulation and Uncertainty Quantification, Environmental, Ecological Sciences & SustainabilityMathematical Biology

This PhD project focuses on advancing ecological research by using high-resolution datasets and GPU computing to develop digital twins of ecological systems. The study will concentrate on a population of free-roaming sheep in Patagonia, Argentina, examining the relationship between individual decision-making and population dynamics. Using data from state-of-the-art GPS collars, the research will investigate the impact of an individual’s condition on activity budgets and space use, and the dual influence of parasites on behaviour and energy balance. The digital twins will enhance the accuracy of population-level predictions and offer a versatile and transferable framework for ecosystem monitoring, providing critical insights for environmental policy, conservation strategies, and sustainable food systems.

Downscaling and Prediction of Rainfall Extremes from Climate Model Outputs (PhD)

Supervisors: Sebastian Gerhard Mutz (GES, UoG), Daniela Castro-Camilo
Relevant research groups: 
Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability

In the last decade, Scotland’s rainfall increased by 9% annually and 19% in winter, with more water from extreme events, posing risks to the environment, infrastructure, health, and industry. Urgent issues such as flooding, mass wasting, and water quality are closely tied to rainfall extremes. Reliable predictions of extremes are, therefore, critical for risk management. Prediction of extremes, which is one of the main focuses of extreme value theory, is still considered one of the grand challenges by the World Climate Research Programme. This project will address this challenge by developing novel statistical, computationally efficient models that are able to predict rainfall extremes from the output of GPU-optimised climate models.

Exploring Hybrid Flood modelling leveraging GPU/Exascale computing (PhD)

Supervisors: Andrew Elliott, Lindsay Beevers (University of Edinburgh), Claire MillerMichele Weiland (University of Edinburgh)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification

Flood modelling is crucial for understanding flood hazards, now and in the future as a result of climate change. Modelling provides inundation extents (or flood footprints) which provide outlines of areas at risk which can help to manage our increasingly complex infrastructure network as our climate changes. Our ability to make fast, accurate predictions of fluvial inundation extents is important for disaster risk reduction. Simultaneously capturing uncertainty in forecasts or predictions is essential for efficient planning and design. Both aims require methods which are computationally efficient whilst maintaining accurate predictions. Current Navier-stokes physics-based models are computationally intensive; thus this project would explore approaches to hybrid flood models which utilise GPU-compute and ML fused with physics-based models, as well as investigating scaling the numerical models to large-scale HPC resources.

Scalable approaches to mathematical modelling and uncertainty quantification in heterogeneous peatlands (PhD)

Supervisors: Raimondo Penta, Vinny Davies, Jessica Davies (Lancaster University), Lawrence BullMatteo Icardi (University of Nottingham)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification, Continuum Mechanics

While only covering 3% of the Earth’s surface, peatlands store >30% of terrestrial carbon and play a vital ecological role. Peatlands are, however, highly sensitive to climate change and human pressures, and therefore understanding and restoring them is crucial for climate action. Multiscale mathematical models can represent the complex microstructures and interactions that control peatland dynamics but are limited by their computational demands. GPU and Exascale computing advances offer a timely opportunity to unlock the potential benefits of mathematically-led peatland modelling approaches. By scaling these complex models to run on new architectures or by directly incorporating mathematical constraints into GPU-based deep learning approaches, scalable computing will to deliver transformative insights into peatland dynamics and their restoration, supporting global climate efforts.

Scalable Inference and Uncertainty Quantification for Ecosystem Modelling (PhD)

Supervisors: Vinny Davies, Richard Reeve (BOHVM, UoG), David Johnson (Lancaster University), Christina CobboldNeil Brummitt (Natural History Museum)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification

Understanding the stability of ecosystems and how they are impacted by climate and land use change can allow us to identify sites where biodiversity loss will occur and help to direct policymakers in mitigation efforts. Our current digital twin of plant biodiversity – https://github.com/EcoJulia/EcoSISTEM.jl – provides functionality for simulating species through processes of competition, reproduction, dispersal and death, as well as environmental changes in climate and habitat, but it would benefit from enhancement in several areas. The three this project would most likely target are the introduction of a soil layer (and the improvement of the modelling of soil water); improving the efficiency of the code to handle a more complex model and to allow stochastic and systematic Uncertainty Quantification (UQ); and developing techniques for scalable inference of missing parameters.

Smart-sensing for systems-level water quality monitoring (PhD)

Supervisors: Craig Wilkie, Lawrence Bull, Claire MillerStephen Thackeray (Lancaster University)
Relevant research groups: 
Machine Learning and AI, Emulation and Uncertainty Quantification, Environmental, Ecological Sciences & Sustainability

Freshwater systems are vital for sustaining the environment, agriculture, and urban development, yet in the UK, only 33% of rivers and canals meet ‘good ecological status’ (JNCC, 2024). Water monitoring is essential to mitigate the damage caused by pollutants (from agriculture, urban settlements, or waste treatment) and while sensors are increasingly affordable, coverage remains a significant issue. New techniques for edge processing and remote power offer one solution, providing alternative sources of telemetry data. However, methods which combine such information into systems-level sensing for water are not as mature as other applications (e.g., built environment). In response, procedures for computation at the edge, decision-making, and data/model interoperability are considerations of this project.

Statistical Emulation Development for Landscape Evolution Models (PhD)

Supervisors: Benn Macdonald, Mu Niu, Paul Eizenhöfer (GES, UoG), Eky Febrianto (Engineering, UoG)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification

Many real-world processes, including those governing landscape evolution, can be effectively mathematically described via differential equations. These equations describe how processes, e.g. the physiography of mountainous landscapes, change with respect to other variables, e.g. time and space. Conventional approaches for performing statistical inference involve repeated numerical solving of the equations. Every time parameters of the equations are changed in a statistical optimisation or sampling procedure; the equations need to be re-solved numerically. The associated large computational cost limits advancements when scaling to more complex systems, the application of statistical inference and machine learning approaches, as well as the implementation of more holistic approaches to Earth System science. This yields to the need for an accelerated computing paradigm involving highly parallelised GPUs for the evaluation of the forward problem.

Beyond advanced computing hardware, emulation is becoming a more popular way to tackle this issue. The idea is that first the differential equations are solved as many times as possible and then the output is interpolated using statistical techniques. Then, when inference is carried out, the emulator predictions replace the differential equation solutions. Since prediction from an emulator is very fast, this avoids the computational bottleneck. If the emulator is a good representation of the differential equation output, then parameter inference can be accurate.

The student will begin by working on parallelising the numerical solver of the mathematical model via GPUs. This means that many more solutions can be generated on which to build the emulator, in a timeframe that is feasible. Then, they will develop efficient emulators for complex landscape evolution models, as the PhD project evolves.

Towards exa-scale simulations of slabs, core-mantle heterogeneities and the geodynamo (PhD)

Supervisors: Radostin Simitev, Antoniette Greta Grima (GES, UoG)Dr Kevin Stratford (University of Edinburgh)
Relevant research groups: 
Geophysical & astrophysical fluid dynamics

Scientific computing is crucial for understanding geophysical fluid flows, such as the geodynamo that sustains Earth’s magnetic field. This project will adapt an existing pseudo-spectral geodynamo code for magnetohydrodynamic simulations in rotating spherical geometries to GPU architectures, improving efficiency on modern computing systems and enabling simulations of more realistic regimes. This will advance our understanding of Earth’s geomagnetic field and its broader interactions, such as those with mantle heterogeneities. Evidence from seismology and geodynamics shows that the core-mantle boundary (CMB) is highly heterogeneous, influencing heat transport and geodynamo dynamics. By combining compressible, thermochemical convection with geodynamo simulations, this project will further investigate how deep slab properties affect the CMB heat flux, mantle heterogeneity, and the geodynamo.

The Leverhulme Programme for Doctoral Training in Ecological Data Science

Our Leverhulme Programme for Doctoral Training in Ecological Data Science will train a new generation of data scientists. Students will be equipped with the skills to tackle the most pressing environmental challenges of our time and be trained in the latest data science techniques. Application information can be found on their Apply page. Current opportunities based within the school are as follows:

Collective animal movement and resource selection in changing environments (PhD)

Supervisors: Mu Nui, Paul Blackwell (Sheffield), Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Advances in technologies such as GPS have revolutionized the tracking of wildlife, providing detailed data on how animals move and interact. It is now feasible to track multiple animals simultaneously, at high frequency and for long periods. This project explores the movement of animals both individually and in groups, focusing on how environmental factors and resource availability shape their behaviours. Animals in groups often move interdependently, influenced by interactions between individuals. However, traditional movement models primarily address individual animals and ignore these group dynamics. On the other hand, collective movement models are often parameterized with short-term data. This research will develop innovative statistical models to better understand how animals move collectively. Such movement necessarily involves each individual responding to their physical environment, as well as other group members, and a key aspect of this project is understanding how animals use space and resources within the group setting. Incorporating both these aspects of short-term movement decisions and long-term space use in a coherent mathematical model will illuminate how animals collectively adapt to their surroundings. It will use cutting-edge statistical and machine learning methods, such as diffusion models. The findings and methodology developed will provide valuable insights into animal behaviour and ecology, supporting conservation efforts and helping manage human impacts on wildlife.

Extreme value theory for predicting animal dispersal and movement in a changing climate (PhD)

Supervisors: Jafet Belmont Osuna, Daniela Castro-Camilo, Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

There is an imperative need to understand and predict how populations respond to multiple aspects of global change, such as habitat fragmentation and climate change. Extreme weather events, which are expected to increase in both frequency and intensity, can profoundly impact animal movement and spatial dynamics. Additionally, for many species, rare long-distance dispersal events play a crucial role in reaching suitable habitats for germination, establishment, and colonisation across fragmented or managed landscapes. Many plant species, for instance, rely on birds for dispersal—birds ingest fruits and later deposit seeds through defecation or regurgitation. Accurately predicting such processes requires models that capture both seed retention times within birds and bird movement patterns. This project aims to develop and apply cutting-edge statistical methods for analysing animal movement and dispersal data using Extreme-Value Theory (EVT) within a Bayesian framework. EVT, a well-established theoretical framework that has been widely used in environmental sciences for modelling extreme events, has seen limited application in ecology. We will leverage EVT to (1) understand how extreme weather events can affect animal movement, and (2) to make better predictions of dispersal processes. This work offers substantial potential for novel insights and methodological advancements. By integrating experimental research and state-of-the-art tracking technologies, the project will inform the development of hierarchical Bayesian models to explore patterns and drivers of animal movement and dispersal, with a particular focus on extreme behaviours and their ecological implications.

Leveraging large language models to provide insights into global plant biodiversity (PhD)

Supervisors: Richard Reeve (MVLS, UoG), Jake Lever (CS, UoG), Vinny Davies, Neil Brummitt (NHM), Ana Claudia Araujo (NHM)Ben Scott (NHM)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Plants are fundamental to the provision of ecosystem services, and we are wholly dependent on them for survival. Yet, globally, many plant species are under threat of extinction. We need a comprehensive plant trait dataset as input to the next generation of biodiversity-climate models. The lack of such a dataset means that existing approaches focus on limited "Plant Functional Types" and cannot estimate the impacts of climate and land use change on individual species or help inform decision making on mitigating biodiversity loss. The needed plant trait data, from niche preferences to growth rates, are locked in the text of the vast botanical literature of the Biodiversity Heritage Library and other texts available to the Natural History Museum. This studentship would use the recent advances in large language models (LLMs) and natural language processing (NLP) to extract this information. We have developed an ecosystem modelling tool (EcoSISTEM, Harris et al., 2023, https://github.com/EcoJulia/EcoSISTEM.jl) that captures survival, competition and reproduction among multiple plant species across a landscape. LLMs will enable extraction of traits data for integration into the EcoSISTEM infrastructure and enable the inclusion of multilingual records, expanding the system's geographic and historical range. By addressing these enormous data gaps, the student will then explore global spatial and temporal variability in functional and other trait-based diversity measures to produce a unique and comprehensive evaluation of whether predictors exist of diversity at a global scale. Ultimately, the project will boost EcoSISTEM's ability to simulate plant responses to climate change with greater accuracy.

The impact of deep learning optimization and design choices for marine biodiversity monitoring (PhD)

Supervisors: Tiffany VlaarLaurence De Clippele (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

This project aims to increase the efficiency, accuracy, and reliability of annotation and classification of large marine datasets using deep learning. Timely and accurate analysis of these long-term datasets will aid marine biodiversity monitoring efforts. Design of more efficient strategies further aims to reduce the carbon footprint of training and fine-tuning large machine learning models. The project is expected to lead to various novel insights for the machine learning community such as on optimal pre-training choices for downstream robust performance, the optimal order of learning samples with varying complexity levels, navigating instances with label ground truth uncertainty, and re-evaluation of metric design. The PhD student will be supported in building international collaborations with researchers across different disciplines and in developing effective research communication skills.

IAPETUS2

Named after the ancient ocean that closed to bring together England and Scotland. Iapetus2 is a partnership that joins the leading research universities of Durham, Heriot Watt, Glasgow, Newcastle, St Andrews and Stirling, together with the British Antarctic Survey, British Geological Survey and the Centre for Ecology & Hydrology, in a united approach to doctoral research and training the next generation of leaders in the science of the natural environment. Application information can be found on their Apply page.

DiveIn (EPSRC CDT in Diversity-Led, Mission-Driven Research)

The DiveIn CDT prioritises diversity, creating an inclusive space for varied talents to produce transformative interdisciplinary research in Net Zero, AI and Big Data, Technology Touching Life, Future Telecoms, Quantum Technologies and more. Application information can be found on their Apply page.

NETGAIN

NETGAIN (developing the science and practice of nature markets for a net positive future) is a collaborative CDT held between the Universities of St Andrews, Aberdeen, Durham and Glasgow. NETGAIN will train a new generation of multidisciplinary scientist-practitioners to transform the landscape of nature markets, ensuring effective, evidence-based solutions to the world’s most urgent environmental challenges. The application process and potential project will be linked from here soon.

Innovation in analysis and Inference

Applied Probability and Stochastic Processes - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Evaluating probabilistic forecasts in high-dimensional settings (PhD)

Supervisors: Jethro Browell
Relevant research groups: Modelling in Space and TimeComputational StatisticsApplied Probability and Stochastic Processes

Many decisions are informed by forecasts, and almost all forecasts are uncertain to some degree. Probabilistic forecasts quantify uncertainty to help improve decision-making and are playing an important role in fields including weather forecasting, economics, energy, and public policy. Evaluating the quality of past forecasts is essential to give forecasters and forecast users confidence in their current predictions, and to compare the performance of forecasting systems.

While the principles of probabilistic forecast evaluation have been established over the past 15 years, most notably that of “sharpness subject to calibration/reliability”, we lack a complete toolkit for applying these principles in many situations, especially those that arise in high-dimensional settings. Furthermore, forecast evaluation must be interpretable by forecast users as well as expert forecasts, and assigning value to marginal improvements in forecast quality remains a challenge in many sectors.

This PhD will develop new statistical methods for probabilistic forecast evaluation considering some of the following issues:

  • Verifying probabilistic calibration conditional on relevant covariates
  • Skill scores for multivariate probabilistic forecasts where “ideal” performance is unknowable
  • Assigning value to marginal forecast improvement though the convolution of utility functions and Murphey Diagrams
  • Development of the concept of “anticipated verification” and “predicting the of uncertainty of future forecasts”
  • Decomposing forecast misspecification (e.g. into spatial and temporal components)
  • Evaluation of Conformal Predictions

Good knowledge of multivariate statistics is essential, prior knowledge of probabilistic forecasting and forecast evaluation would be an advantage.

Adaptive probabilistic forecasting (PhD)

Supervisors: Jethro Browell
Relevant research groups: Modelling in Space and TimeComputational StatisticsApplied Probability and Stochastic Processes

Data-driven predictive models depend on the representativeness of data used in model selection and estimation. However, many processes change over time meaning that recent data is more representative than old data. In this situation, predictive models should track these changes, which is the aim of “online” or “adaptive” algorithms. Furthermore, many users of forecasts require probabilistic forecasts, which quantify uncertainty, to inform their decision-making. Existing adaptive methods such as Recursive Least Squares, the Kalman Filter have been very successful for adaptive point forecasting, but adaptive probabilistic forecasting has received little attention. This PhD will develop methods for adaptive probabilistic forecasting from a theoretical perspective and with a view to apply these methods to problems in at least one application area to be determined.

In the context of adaptive probabilistic forecasting, this PhD may consider:

  • Online estimation of Generalised Additive Models for Location Scale and Shape
  • Online/adaptive (multivariate) time series prediction
  • Online aggregation (of experts, or hierarchies)

A good knowledge of methods for time series analysis and regression is essential, familiarity with flexible regression (GAMs) and distributional regression (GAMLSS/quantile regression) would be an advantage.

Modality of mixtures of distributions (PhD)

Supervisors: Surajit Ray
Relevant research groups: Nonparametric and Semi-parametric StatisticsApplied Probability and Stochastic ProcessesStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modeling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. In this project, we wish to explore the modality of non-normal distributions and their application to real life problems.

Bayesian Modelling and Inference - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Modelling genetic variation (MSc/PhD)

Supervisors: Vincent Macaulay
Relevant research groups: Bayesian Modelling and InferenceStatistical Modelling for Biology, Genetics and *omics

Variation in the distribution of different DNA sequences across individuals has been shaped by many processes which can be modelled probabilistically, processes such as demographic factors like prehistoric population movements, or natural selection. This project involves developing new techniques for teasing out information on those processes from the wealth of raw data that is now being generated by high-throughput genetic assays, and is likely to involve computationally-intensive sampling techniques to approximate the posterior distribution of parameters of interest. The characterization of the amount of population structure on different geographical scales will influence the design of experiments to identify the genetic variants that increase risk of complex diseases, such as diabetes or heart disease.

The evolution of shape (PhD)

Supervisors: Vincent Macaulay
Relevant research groups: Bayesian Modelling and InferenceModelling in Space and Time, Statistical Modelling for Biology, Genetics and *omics

Shapes of objects change in time. Organisms evolve and in the process change form: humans and chimpanzees derive from some common ancestor presumably different from either in shape. Designed objects are no different: an Art Deco tea pot from the 1920s might share some features with one from Ikea in 2010, but they are different. Mathematical models of evolution for certain data types, like the strings of As, Gs , Cs and Ts in our evolving DNA, are quite mature and allow us to learn about the relationships of the objects (their phylogeny or family tree), about the changes that happen to them in time (the evolutionary process) and about the ways objects were configured in the past (the ancestral states), by statistical techniques like phylogenetic analysis. Such techniques for shape data are still in their infancy. This project will develop novel statistical inference approaches (in a Bayesian context) for complex data objects, like functions, surfaces and shapes, using Gaussian-process models, with potential application in fields as diverse as language evolution, morphometrics and industrial design.

New methods for analysis of migratory navigation (PhD)

Supervisors: Janine IllianDr Urška Demšar (University of St Andrews)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Migratory birds travel annually across vast expanses of oceans and continents to reach their destination with incredible accuracy. How they are able to do this using only locally available cues is still not fully understood. Migratory navigation consists of two processes: birds either identify the direction in which to fly (compass orientation) or the location where they are at a specific moment in time (geographic positioning). One of the possible ways they do this is to use information from the Earth’s magnetic field in the so-called geomagnetic navigation (Mouritsen, 2018). While there is substantial evidence (both physiological and behavioural) that they do sense magnetic field (Deutschlander and Beason, 2014), we however still do not know exactly which of the components of the field they use for orientation or positioning. We also do not understand how rapid changes in the field affect movement behaviour.

There is a possibility that birds can sense these rapid large changes and that this may affect their navigational process. To study this, we need to link accurate data on Earth’s magnetic field with animal tracking data. This has only become possible very recently through new spatial data science advances:  we developed the MagGeo tool, which links contemporaneous geomagnetic data from Swarm satellites of the European Space Agency with animal tracking data (Benitez Paez et al., 2021).

Linking geomagnetic data to animal tracking data however creates a highly-dimensional data set, which is difficult to explore. Typical analyses of contextual environmental information in ecology include representing contextual variables as co-variates in relatively simple statistical models (Brum Bastos et al., 2021), but this is not sufficient for studying detailed navigational behaviour. This project will analyse complex spatio-temporal data using computationally efficient statistical model fitting approches in a Bayesian context.

This project is fully based on open data to support reproducibility and open science. We will test our new methods by annotating publicly available bird tracking data (e.g. from repositories such as Movebank.org), using the open MagGeo tool and implementing our new methods as Free and Open Source Software (R/Python).

Integrated spatio-temporal modelling for environmental data (PhD)

Supervisors: Janine IllianPeter Henrys (UKCEH)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

The last decade has seen a proliferation of environmental data with vast quantities of information available from various sources. This has been due to a number of different factors including: the advent of sensor technologies; the provision of remotely sensed data from both drones and satellites; and the explosion in citizen science initiatives. These data represent a step change in the resolution of available data across space and time - sensors can be streaming data at a resolution of seconds whereas citizen science observations can be in the hundreds of thousands.  

Over the same period, the resources available for traditional field surveys have decreased dramatically whilst logistical issues (such as access to sites, ) have increased. This has severely impacted the ability for field survey campaigns to collect data at high spatial and temporal resolutions. It is exactly this sort of information that is required to fit models that can quantify and predict the spread of invasive species, for example. 

Whilst we have seen an explosion of data across various sources, there is no single source that provides both the spatial and temporal intensity that may be required when fitting complex spatio-temporal models (cf invasive species example) - each has its own advantages and benefits in terms of information content. There is therefore potentially huge benefit in beginning together data from these different sources within a consistent framework to exploit the benefits each offers and to understand processes at unprecedented resolutions/scales that would be impossible to monitor. 

Current approaches to combining data in this way are typically very bespoke and involve complex model structures that are not reusable outside of the particular application area. What is needed is an overarching generic methodological framework and associated software solutions to implement such analyses. Not only would such a framework provide the methodological basis to enable researchers to benefit from this big data revolution, but also the capability to change such analyses from being stand alone research projects in their own right, to more operational, standard analytical routines. 

FInally, such dynamic, integrated analyses could feedback into data collection initiatives to ensure optimal allocation of effort for traditional surveys or optimal power management for sensor networks. The major step change being that this optimal allocation of effort is conditional on other data that is available. So, for example, given the coverage and intensity of the citizen science data, where should we optimally send our paid surveyors? The idea is that information is collected at times and locations that provide the greatest benefit in understanding the underpinning stochastic processes. These two major issues - integrated analyses and adaptive sampling - ensure that environmental monitoring is fit for purpose and scientists, policy and industry can benefit from the big data revolution. 

This project will develop an integrated statistical modelling strategy that provides a single modelling framework for enabling quantification of ecosystem goods and services while accounting for the fundamental differences in different data streams. Data collected at different spatial resolutions can be used within the same model through projecting it into continuous space and projecting it back into the landscape level of interest.  As a result, decisions can be made at the relevant spatial scale and uncertainty is propagated through, facilitating appropriate decision making.

Statistical methodology for assessing the impacts of offshore renewable developments on marine wildlife (PhD)

Supervisors: Janine Illian, Esther Jones (BIOSS)Adam Butler (BIOSS)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Assessing the impacts of offshore renewable developments on marine wildlife is a critical component of the consenting process. A NERC-funded project, ECOWINGS, will provide a step-change in analysing predator-prey dynamics in the marine environment, collecting data across trophic levels against a backdrop of developing wind farms and climate change. Aerial survey and GPS data from multiple species of seabirds will be collected contemporaneously alongside prey data available over the whole water column from an automated surface vehicle and underwater drone.

These methods of data collection will generate 3D space and time profiles of predators and prey, creating a rich source of information and enormous potential for modelling and interrogation. The data present a unique opportunity for experimental design across a dynamic and changing marine ecosystem, which is heavily influenced by local and global anthropogenic activities. However, these data have complex intrinsic spatio-temporal properties, which are challenging to analyse. Significant statistical methods development could be achieved using this system as a case study, contributing to the scientific knowledge base not only in offshore renewables but more generally in the many circumstances where patchy ecological spatio-temporal data are available. 

This PhD project will develop spatio-temporal modelling methodology that will allow user to anaylse these exciting - and complex - data sets and help inform our knowledge on the impact of off-shore renewable on wildlife.

Bayesian variable selection for genetic and genomic studies (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Bayesian Modelling and InferenceComputational StatisticsStatistical Modelling for Biology, Genetics and *omics

An important issue in high-dimensional regression problems is the accurate and efficient estimation of models when, compared to the number of data points, a substantially larger number of potential predictors are present. Further complications arise with correlated predictors, leading to the breakdown of standard statistical models for inference; and the uncertain definition of the outcome variable, which is often a varying composition of several different observable traits. Examples of such problems arise in many scenarios in genomics- in determining expression patterns of genes that may be responsible for a type of cancer; and in determining which genetic mutations lead to higher risks for occurrence of a disease. This project involves developing broad and improved Bayesian methodologies for efficient inference in high-dimensional regression-type problems with complex multivariate outcomes, with a focus on genetic data applications.

The successful candidate should have a strong background in methodological and applied Statistics, expert skills in relevant statistical software or programming languages (such as R, C/C++/Python), and also have a deep interest in developing knowledge in cross-disciplinary topics in genomics. The candidate will be expected to consolidate and master an extensive range of topics in modern Statistical theory and applications during their PhD, including advanced Bayesian modelling and computation, latent variable models, machine learning, and methods for Big Data. The successful candidate will be considered for funding to cover domestic tuition fees, as well as paying a stipend at the Research Council rate for four years.

Bayesian statistical data integration of single-cell and bulk “OMICS” datasets with clinical parameters for accurate prediction of treatment outcomes in Rheumatoid Arthritis (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Bayesian Modelling and InferenceComputational StatisticsStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

In recent years, many different computational methods to analyse biological data have been established: including DNA (Genomics), RNA (Transcriptomics), Proteins (proteomics) and Metabolomics, that captures more dynamic events. These methods were refined by the advent of single cell technology, where it is now possible to capture the transcriptomics profile of single cells, spatial arrangements of cells from flow methods or imaging methods like functional magnetic resonance imaging. At the same time, these OMICS data can be complemented with clinical data – measurement of patients, like age, smoking status, phenotype of disease or drug treatment. It is an interesting and important open statistical question how to combine data from different “modalities” (like transcriptome with clinical data or imaging data) in a statistically valid way, to compare different datasets and make justifiable statistical inferences. This PhD project will be jointly supervised with Dr. Thomas Otto and Prof. Stefan Siebert from the Institute of Infection, Immunity & Inflammation), you will explore how to combine different datasets using Bayesian latent variable modelling, focusing on clinical datasets from Rheumatoid Arthritis.

Bayesian Mixture Models for Spatio-Temporal Data (PhD)

Supervisors: Craig Anderson
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Biostatistics, Epidemiology and Health Applications

The prevalence of disease is typically not constant across space – instead the risk tends to vary from one region to another.  Some of this variability may be down to environmental conditions, but many of them are driven by socio-economic differences between regions, with poorer regions tending to have worse health than wealthier regions.  For example, within the the Greater Glasgow and Clyde region, where the World Health Organisation noted that life expectancy ranges from 54 in Calton to 82 in Lenzie, despite these areas being less than 10 miles apart. There is substantial value to health professionals and policymakers in identifying some of the causes behind these localised health inequalities.

Disease mapping is a field of statistical epidemiology which focuses on estimating the patterns of disease risk across a geographical region. The main goal of such mapping is typically to identify regions of high disease risk so that relevant public health interventions can be made. This project involves the development of statistical models which will enhance our understanding regional differences in the risk of suffering from major diseases by focusing on these localised health inequalities.

Standard Bayesian hierarchical models with a conditional autoregressive prior are frequently used for risk estimation in this context, but these models assume a smooth risk surface which is often not appropriate in practice. In reality, it will often be the case that different regions have vastly different risk profiles and require different data generating functions as a result.

In this work we propose a mixture model based approach which allows different sub-populations to be represented by different underlying statistical distributions within a single modelling framework. By integrating CAR models into mixture models, researchers can simultaneously account for spatial dependencies and identify distinct disease patterns within subpopulations.

Detecting hotspots of water pollution in complex constrained domains and networks (PhD)

Supervisors: Mu NiuCraig WilkieCathy Yi-Hsuan Chen (Business School, UofG)Michael Tso (Lancaster University)
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

Technological developments with smart sensors are changing the way that the environment is monitored.  Many such smart systems are under development, with small, energy efficient, mobile sensors being trialled.  Such systems offer opportunities to change how we monitor the environment, but this requires additional statistical development in the optimisation of the location of the sensors.

The aim of this project is to develop a mathematical and computational inferential framework to identify optimal sensor deployment locations within complex, constrained domains and networks for improved water contamination detection. Methods for estimating covariance functions in such domains rely on computationally intensive diffusion process simulations, limiting their application to relatively simple domains and small-scale datasets. To address this challenge, the project will employ accelerated computing paradigms with highly parallelized GPUs to enhance simulation efficiency. The framework will also address regression, classification, and optimization problems on latent manifolds embedded in high-dimensional spaces, such as image clouds (e.g., remote sensing satellite images), which are crucial for sensor deployment and performance evaluation. As the project progresses, particularly in the image cloud case, the computational demands will intensify, requiring advanced GPU resources or exascale computing to ensure scalability, efficiency, and performance.

Downscaling and Prediction of Rainfall Extremes from Climate Model Outputs (PhD)

Supervisors: Sebastian Gerhard Mutz (GES, UoG)Daniela Castro-Camilo
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

In the last decade, Scotland’s rainfall increased by 9% annually and 19% in winter, with more water from extreme events, posing risks to the environment, infrastructure, health, and industry. Urgent issues such as flooding, mass wasting, and water quality are closely tied to rainfall extremes. Reliable predictions of extremes are, therefore, critical for risk management. Prediction of extremes, which is one of the main focuses of extreme value theory, is still considered one of the grand challenges by the World Climate Research Programme. This project will address this challenge by developing novel statistical, computationally efficient models that are able to predict rainfall extremes from the output of GPU-optimised climate models.

 

Computational Statistics - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Evaluating probabilistic forecasts in high-dimensional settings (PhD)

Supervisors: Jethro Browell
Relevant research groups: Modelling in Space and TimeComputational StatisticsApplied Probability and Stochastic Processes

Many decisions are informed by forecasts, and almost all forecasts are uncertain to some degree. Probabilistic forecasts quantify uncertainty to help improve decision-making and are playing an important role in fields including weather forecasting, economics, energy, and public policy. Evaluating the quality of past forecasts is essential to give forecasters and forecast users confidence in their current predictions, and to compare the performance of forecasting systems.

While the principles of probabilistic forecast evaluation have been established over the past 15 years, most notably that of “sharpness subject to calibration/reliability”, we lack a complete toolkit for applying these principles in many situations, especially those that arise in high-dimensional settings. Furthermore, forecast evaluation must be interpretable by forecast users as well as expert forecasts, and assigning value to marginal improvements in forecast quality remains a challenge in many sectors.

This PhD will develop new statistical methods for probabilistic forecast evaluation considering some of the following issues:

  • Verifying probabilistic calibration conditional on relevant covariates
  • Skill scores for multivariate probabilistic forecasts where “ideal” performance is unknowable
  • Assigning value to marginal forecast improvement though the convolution of utility functions and Murphey Diagrams
  • Development of the concept of “anticipated verification” and “predicting the of uncertainty of future forecasts”
  • Decomposing forecast misspecification (e.g. into spatial and temporal components)
  • Evaluation of Conformal Predictions

Good knowledge of multivariate statistics is essential, prior knowledge of probabilistic forecasting and forecast evaluation would be an advantage.

Adaptive probabilistic forecasting (PhD)

Supervisors: Jethro Browell
Relevant research groups: Modelling in Space and TimeComputational StatisticsApplied Probability and Stochastic Processes

Data-driven predictive models depend on the representativeness of data used in model selection and estimation. However, many processes change over time meaning that recent data is more representative than old data. In this situation, predictive models should track these changes, which is the aim of “online” or “adaptive” algorithms. Furthermore, many users of forecasts require probabilistic forecasts, which quantify uncertainty, to inform their decision-making. Existing adaptive methods such as Recursive Least Squares, the Kalman Filter have been very successful for adaptive point forecasting, but adaptive probabilistic forecasting has received little attention. This PhD will develop methods for adaptive probabilistic forecasting from a theoretical perspective and with a view to apply these methods to problems in at least one application area to be determined.

In the context of adaptive probabilistic forecasting, this PhD may consider:

  • Online estimation of Generalised Additive Models for Location Scale and Shape
  • Online/adaptive (multivariate) time series prediction
  • Online aggregation (of experts, or hierarchies)

A good knowledge of methods for time series analysis and regression is essential, familiarity with flexible regression (GAMs) and distributional regression (GAMLSS/quantile regression) would be an advantage.

New methods for analysis of migratory navigation (PhD)

Supervisors: Janine IllianUrška Demšar (University of St Andrews)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Migratory birds travel annually across vast expanses of oceans and continents to reach their destination with incredible accuracy. How they are able to do this using only locally available cues is still not fully understood. Migratory navigation consists of two processes: birds either identify the direction in which to fly (compass orientation) or the location where they are at a specific moment in time (geographic positioning). One of the possible ways they do this is to use information from the Earth’s magnetic field in the so-called geomagnetic navigation (Mouritsen, 2018). While there is substantial evidence (both physiological and behavioural) that they do sense magnetic field (Deutschlander and Beason, 2014), we however still do not know exactly which of the components of the field they use for orientation or positioning. We also do not understand how rapid changes in the field affect movement behaviour.

There is a possibility that birds can sense these rapid large changes and that this may affect their navigational process. To study this, we need to link accurate data on Earth’s magnetic field with animal tracking data. This has only become possible very recently through new spatial data science advances:  we developed the MagGeo tool, which links contemporaneous geomagnetic data from Swarm satellites of the European Space Agency with animal tracking data (Benitez Paez et al., 2021).

Linking geomagnetic data to animal tracking data however creates a highly-dimensional data set, which is difficult to explore. Typical analyses of contextual environmental information in ecology include representing contextual variables as co-variates in relatively simple statistical models (Brum Bastos et al., 2021), but this is not sufficient for studying detailed navigational behaviour. This project will analyse complex spatio-temporal data using computationally efficient statistical model fitting approches in a Bayesian context.

This project is fully based on open data to support reproducibility and open science. We will test our new methods by annotating publicly available bird tracking data (e.g. from repositories such as Movebank.org), using the open MagGeo tool and implementing our new methods as Free and Open Source Software (R/Python).

Integrated spatio-temporal modelling for environmental data (PhD)

Supervisors: Janine IllianPeter Henrys (UKCEH)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

The last decade has seen a proliferation of environmental data with vast quantities of information available from various sources. This has been due to a number of different factors including: the advent of sensor technologies; the provision of remotely sensed data from both drones and satellites; and the explosion in citizen science initiatives. These data represent a step change in the resolution of available data across space and time - sensors can be streaming data at a resolution of seconds whereas citizen science observations can be in the hundreds of thousands.  

Over the same period, the resources available for traditional field surveys have decreased dramatically whilst logistical issues (such as access to sites, ) have increased. This has severely impacted the ability for field survey campaigns to collect data at high spatial and temporal resolutions. It is exactly this sort of information that is required to fit models that can quantify and predict the spread of invasive species, for example. 

Whilst we have seen an explosion of data across various sources, there is no single source that provides both the spatial and temporal intensity that may be required when fitting complex spatio-temporal models (cf invasive species example) - each has its own advantages and benefits in terms of information content. There is therefore potentially huge benefit in beginning together data from these different sources within a consistent framework to exploit the benefits each offers and to understand processes at unprecedented resolutions/scales that would be impossible to monitor. 

Current approaches to combining data in this way are typically very bespoke and involve complex model structures that are not reusable outside of the particular application area. What is needed is an overarching generic methodological framework and associated software solutions to implement such analyses. Not only would such a framework provide the methodological basis to enable researchers to benefit from this big data revolution, but also the capability to change such analyses from being stand alone research projects in their own right, to more operational, standard analytical routines. 

FInally, such dynamic, integrated analyses could feedback into data collection initiatives to ensure optimal allocation of effort for traditional surveys or optimal power management for sensor networks. The major step change being that this optimal allocation of effort is conditional on other data that is available. So, for example, given the coverage and intensity of the citizen science data, where should we optimally send our paid surveyors? The idea is that information is collected at times and locations that provide the greatest benefit in understanding the underpinning stochastic processes. These two major issues - integrated analyses and adaptive sampling - ensure that environmental monitoring is fit for purpose and scientists, policy and industry can benefit from the big data revolution. 

This project will develop an integrated statistical modelling strategy that provides a single modelling framework for enabling quantification of ecosystem goods and services while accounting for the fundamental differences in different data streams. Data collected at different spatial resolutions can be used within the same model through projecting it into continuous space and projecting it back into the landscape level of interest.  As a result, decisions can be made at the relevant spatial scale and uncertainty is propagated through, facilitating appropriate decision making. 

Statistical methodology for assessing the impacts of offshore renewable developments on marine wildlife (PhD)

Supervisors: Janine IllianEsther Jones (BIOSS)Adam Butler (BIOSS)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Assessing the impacts of offshore renewable developments on marine wildlife is a critical component of the consenting process. A NERC-funded project, ECOWINGS, will provide a step-change in analysing predator-prey dynamics in the marine environment, collecting data across trophic levels against a backdrop of developing wind farms and climate change. Aerial survey and GPS data from multiple species of seabirds will be collected contemporaneously alongside prey data available over the whole water column from an automated surface vehicle and underwater drone.

These methods of data collection will generate 3D space and time profiles of predators and prey, creating a rich source of information and enormous potential for modelling and interrogation. The data present a unique opportunity for experimental design across a dynamic and changing marine ecosystem, which is heavily influenced by local and global anthropogenic activities. However, these data have complex intrinsic spatio-temporal properties, which are challenging to analyse. Significant statistical methods development could be achieved using this system as a case study, contributing to the scientific knowledge base not only in offshore renewables but more generally in the many circumstances where patchy ecological spatio-temporal data are available. 

This PhD project will develop spatio-temporal modelling methodology that will allow user to anaylse these exciting - and complex - data sets and help inform our knowledge on the impact of off-shore renewable on wildlife.

Bayesian variable selection for genetic and genomic studies (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Bayesian Modelling and InferenceComputational StatisticsStatistical Modelling for Biology, Genetics and *omics

An important issue in high-dimensional regression problems is the accurate and efficient estimation of models when, compared to the number of data points, a substantially larger number of potential predictors are present. Further complications arise with correlated predictors, leading to the breakdown of standard statistical models for inference; and the uncertain definition of the outcome variable, which is often a varying composition of several different observable traits. Examples of such problems arise in many scenarios in genomics- in determining expression patterns of genes that may be responsible for a type of cancer; and in determining which genetic mutations lead to higher risks for occurrence of a disease. This project involves developing broad and improved Bayesian methodologies for efficient inference in high-dimensional regression-type problems with complex multivariate outcomes, with a focus on genetic data applications.

The successful candidate should have a strong background in methodological and applied Statistics, expert skills in relevant statistical software or programming languages (such as R, C/C++/Python), and also have a deep interest in developing knowledge in cross-disciplinary topics in genomics. The candidate will be expected to consolidate and master an extensive range of topics in modern Statistical theory and applications during their PhD, including advanced Bayesian modelling and computation, latent variable models, machine learning, and methods for Big Data. The successful candidate will be considered for funding to cover domestic tuition fees, as well as paying a stipend at the Research Council rate for four years.

Bayesian statistical data integration of single-cell and bulk “OMICS” datasets with clinical parameters for accurate prediction of treatment outcomes in Rheumatoid Arthritis (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Bayesian Modelling and InferenceComputational StatisticsStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

In recent years, many different computational methods to analyse biological data have been established: including DNA (Genomics), RNA (Transcriptomics), Proteins (proteomics) and Metabolomics, that captures more dynamic events. These methods were refined by the advent of single cell technology, where it is now possible to capture the transcriptomics profile of single cells, spatial arrangements of cells from flow methods or imaging methods like functional magnetic resonance imaging. At the same time, these OMICS data can be complemented with clinical data – measurement of patients, like age, smoking status, phenotype of disease or drug treatment. It is an interesting and important open statistical question how to combine data from different “modalities” (like transcriptome with clinical data or imaging data) in a statistically valid way, to compare different datasets and make justifiable statistical inferences. This PhD project will be jointly supervised with Dr. Thomas Otto and Prof. Stefan Siebert from the Institute of Infection, Immunity & Inflammation), you will explore how to combine different datasets using Bayesian latent variable modelling, focusing on clinical datasets from Rheumatoid Arthritis.

Funding Notes

The successful candidate will be considered for funding to cover domestic tuition fees, as well as paying a stipend at the Research Council rate for four years.

Analysis of spatially correlated functional data objects (PhD)

Supervisors: Surajit Ray
Relevant research groups: Modelling in Space and TimeComputational StatisticsNonparametric and Semi-parametric StatisticsImaging, Image Processing and Image Analysis

Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which the student will work on:

  • robust statistical models for incorporating temporal and spatial dependencies in functional data
  • developing reliable prediction and interpolation techniques for dependent functional data
  • developing inferential framework for testing hypotheses related to simplified dependent structures
  • analysing sparsely observed functional data by borrowing information from neighbours
  • visualisation of data summaries associated with dependent functional data
  • Clustering of functional data

Emulation and Uncertainty Quantification - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Exploring Hybrid Flood modelling leveraging GPU/Exascale computing (PhD)

Supervisors: Andrew Elliott, Lindsay Beevers (University of Edinburgh), Claire MillerMichele Weiland (University of Edinburgh)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: 
This project is competitively funded through the ExaGEO DLA.

Flood modelling is crucial for understanding flood hazards, now and in the future as a result of climate change. Modelling provides inundation extents (or flood footprints) which provide outlines of areas at risk which can help to manage our increasingly complex infrastructure network as our climate changes. Our ability to make fast, accurate predictions of fluvial inundation extents is important for disaster risk reduction. Simultaneously capturing uncertainty in forecasts or predictions is essential for efficient planning and design. Both aims require methods which are computationally efficient whilst maintaining accurate predictions. Current Navier-stokes physics-based models are computationally intensive; thus this project would explore approaches to hybrid flood models which utilise GPU-compute and ML fused with physics-based models, as well as investigating scaling the numerical models to large-scale HPC resources.

Scalable approaches to mathematical modelling and uncertainty quantification in heterogeneous peatlands (PhD)

Supervisors: Raimondo Penta, Vinny Davies, Jessica Davies (Lancaster University), Lawrence BullMatteo Icardi (University of Nottingham)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification, Continuum Mechanics
Funding: 
This project is competitively funded through the ExaGEO DLA.

While only covering 3% of the Earth’s surface, peatlands store >30% of terrestrial carbon and play a vital ecological role. Peatlands are, however, highly sensitive to climate change and human pressures, and therefore understanding and restoring them is crucial for climate action. Multiscale mathematical models can represent the complex microstructures and interactions that control peatland dynamics but are limited by their computational demands. GPU and Exascale computing advances offer a timely opportunity to unlock the potential benefits of mathematically-led peatland modelling approaches. By scaling these complex models to run on new architectures or by directly incorporating mathematical constraints into GPU-based deep learning approaches, scalable computing will to deliver transformative insights into peatland dynamics and their restoration, supporting global climate efforts.

Scalable Inference and Uncertainty Quantification for Ecosystem Modelling (PhD)

Supervisors: Vinny Davies, Richard Reeve (BOHVM, UoG), David Johnson (Lancaster University), Christina CobboldNeil Brummitt (Natural History Museum)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: 
This project is competitively funded through the ExaGEO DLA

Understanding the stability of ecosystems and how they are impacted by climate and land use change can allow us to identify sites where biodiversity loss will occur and help to direct policymakers in mitigation efforts. Our current digital twin of plant biodiversity – https://github.com/EcoJulia/EcoSISTEM.jl – provides functionality for simulating species through processes of competition, reproduction, dispersal and death, as well as environmental changes in climate and habitat, but it would benefit from enhancement in several areas. The three this project would most likely target are the introduction of a soil layer (and the improvement of the modelling of soil water); improving the efficiency of the code to handle a more complex model and to allow stochastic and systematic Uncertainty Quantification (UQ); and developing techniques for scalable inference of missing parameters.

Smart-sensing for systems-level water quality monitoring (PhD)

Supervisors: Craig Wilkie, Lawrence Bull, Claire MillerStephen Thackeray (Lancaster University)
Relevant research groups: 
Machine Learning and AI, Emulation and Uncertainty Quantification, Environmental, Ecological Sciences & Sustainability
Funding: 
This project is competitively funded through the ExaGEO DLA.

Freshwater systems are vital for sustaining the environment, agriculture, and urban development, yet in the UK, only 33% of rivers and canals meet ‘good ecological status’ (JNCC, 2024). Water monitoring is essential to mitigate the damage caused by pollutants (from agriculture, urban settlements, or waste treatment) and while sensors are increasingly affordable, coverage remains a significant issue. New techniques for edge processing and remote power offer one solution, providing alternative sources of telemetry data. However, methods which combine such information into systems-level sensing for water are not as mature as other applications (e.g., built environment). In response, procedures for computation at the edge, decision-making, and data/model interoperability are considerations of this project.

Statistical Emulation Development for Landscape Evolution Models (PhD)

Supervisors: Benn Macdonald, Mu Niu, Paul Eizenhöfer (GES, UoG), Eky Febrianto (Engineering, UoG)
Relevant research groups: 
Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: 
This project is competitively funded through the ExaGEO DLA.

Many real-world processes, including those governing landscape evolution, can be effectively mathematically described via differential equations. These equations describe how processes, e.g. the physiography of mountainous landscapes, change with respect to other variables, e.g. time and space. Conventional approaches for performing statistical inference involve repeated numerical solving of the equations. Every time parameters of the equations are changed in a statistical optimisation or sampling procedure; the equations need to be re-solved numerically. The associated large computational cost limits advancements when scaling to more complex systems, the application of statistical inference and machine learning approaches, as well as the implementation of more holistic approaches to Earth System science. This yields to the need for an accelerated computing paradigm involving highly parallelised GPUs for the evaluation of the forward problem.

Beyond advanced computing hardware, emulation is becoming a more popular way to tackle this issue. The idea is that first the differential equations are solved as many times as possible and then the output is interpolated using statistical techniques. Then, when inference is carried out, the emulator predictions replace the differential equation solutions. Since prediction from an emulator is very fast, this avoids the computational bottleneck. If the emulator is a good representation of the differential equation output, then parameter inference can be accurate.

The student will begin by working on parallelising the numerical solver of the mathematical model via GPUs. This means that many more solutions can be generated on which to build the emulator, in a timeframe that is feasible. Then, they will develop efficient emulators for complex landscape evolution models, as the PhD project evolves.

Structured ML for physical systems (PhD or MSc)

Supervisors: Lawrence Bull
Relevant research groups: Machine Learning and AIEmulation and Uncertainty Quantification

When using Machine Learning (ML) for science and engineering, an alternative mindset is required to build sensible representations from data. Unlike other applications (e.g. large language models), the datasets are relatively small and curated - i.e. they are collected via experiments rather than scraped from the internet. The limited variance of training data typically renders learning by ‘brute force’ infeasible. Instead, we must encode domain-specific knowledge within ML algorithms to enforce structure and constrain the space of possible models.

This project covers ML for physical systems (Karniadakis, 2021) and looks to integrate machine learning with applied mathematics - fusing scientific knowledge with insights from data. Methods will investigate various levels of constraints on ML predictions - including smoothness, invariances, etc. Relevant topics include:

Generating deep fake left ventricles: a step towards personalised heart treatments (PhD)

Supervisors: Andrew Elliott, Vinny Davies, Hao Gao
Relevant research groups: Machine Learning and AIEmulation and Uncertainty QuantificationBiostatistics, Epidemiology and Health ApplicationsImaging, Image Processing and Image Analysis

Personalised medicine is an exciting avenue in the field of cardiac healthcare where an understanding of patient-specific mechanisms can lead to improved treatments (Gao et al., 2017). The use of mathematical models to link the underlying properties of the heart with cardiac imaging offers the possibility of obtaining important parameters of heart function non-invasively (Gao et al., 2015). Unfortunately, current estimation methods rely on complex mathematical forward simulations, resulting in a solution taking hours, a time frame not suitable for real-time treatment decisions. To increase the applicability of these methods, statistical emulation methods have been proposed as an efficient way of estimating the parameters (Davies et al., 2019Noè et al., 2019). In this approach, simulations of the mathematical model are run in advance and then machine learning based methods are used to estimate the relationship between the cardiac imaging and the parameters of interest. These methods are, however, limited by our ability to understand the how cardiac geometry varies across patients which is in term limited by the amount of data available (Romaszko et al., 2019). In this project we will look at AI based methods for generating fake cardiac geometries which can be used to increase the amount of data (Qiao et al., 2023). We will explore different types of AI generation, including Generative Adversarial Networks or Variational Autoencoders, to understand how we can generate better 3D and 4D models of the fake left ventricles and create an improved emulation strategy that can make use of them.

Machine Learning and AI - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Collective animal movement and resource selection in changing environments (PhD)

Supervisors: Mu Nui, Paul Blackwell (Sheffield), Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Advances in technologies such as GPS have revolutionized the tracking of wildlife, providing detailed data on how animals move and interact. It is now feasible to track multiple animals simultaneously, at high frequency and for long periods. This project explores the movement of animals both individually and in groups, focusing on how environmental factors and resource availability shape their behaviours. Animals in groups often move interdependently, influenced by interactions between individuals. However, traditional movement models primarily address individual animals and ignore these group dynamics. On the other hand, collective movement models are often parameterized with short-term data. This research will develop innovative statistical models to better understand how animals move collectively. Such movement necessarily involves each individual responding to their physical environment, as well as other group members, and a key aspect of this project is understanding how animals use space and resources within the group setting. Incorporating both these aspects of short-term movement decisions and long-term space use in a coherent mathematical model will illuminate how animals collectively adapt to their surroundings. It will use cutting-edge statistical and machine learning methods, such as diffusion models. The findings and methodology developed will provide valuable insights into animal behaviour and ecology, supporting conservation efforts and helping manage human impacts on wildlife.

Leveraging large language models to provide insights into global plant biodiversity (PhD)

Supervisors: Richard Reeve (MVLS, UoG), Jake Lever (CS, UoG), Vinny Davies, Neil Brummitt (NHM), Ana Claudia Araujo (NHM)Ben Scott (NHM)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Plants are fundamental to the provision of ecosystem services, and we are wholly dependent on them for survival. Yet, globally, many plant species are under threat of extinction. We need a comprehensive plant trait dataset as input to the next generation of biodiversity-climate models. The lack of such a dataset means that existing approaches focus on limited "Plant Functional Types" and cannot estimate the impacts of climate and land use change on individual species or help inform decision making on mitigating biodiversity loss. The needed plant trait data, from niche preferences to growth rates, are locked in the text of the vast botanical literature of the Biodiversity Heritage Library and other texts available to the Natural History Museum. This studentship would use the recent advances in large language models (LLMs) and natural language processing (NLP) to extract this information. We have developed an ecosystem modelling tool (EcoSISTEM, Harris et al., 2023, https://github.com/EcoJulia/EcoSISTEM.jl) that captures survival, competition and reproduction among multiple plant species across a landscape. LLMs will enable extraction of traits data for integration into the EcoSISTEM infrastructure and enable the inclusion of multilingual records, expanding the system's geographic and historical range. By addressing these enormous data gaps, the student will then explore global spatial and temporal variability in functional and other trait-based diversity measures to produce a unique and comprehensive evaluation of whether predictors exist of diversity at a global scale. Ultimately, the project will boost EcoSISTEM's ability to simulate plant responses to climate change with greater accuracy.

The impact of deep learning optimization and design choices for marine biodiversity monitoring (PhD)

Supervisors: Tiffany VlaarLaurence De Clippele (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

This project aims to increase the efficiency, accuracy, and reliability of annotation and classification of large marine datasets using deep learning. Timely and accurate analysis of these long-term datasets will aid marine biodiversity monitoring efforts. Design of more efficient strategies further aims to reduce the carbon footprint of training and fine-tuning large machine learning models. The project is expected to lead to various novel insights for the machine learning community such as on optimal pre-training choices for downstream robust performance, the optimal order of learning samples with varying complexity levels, navigating instances with label ground truth uncertainty, and re-evaluation of metric design. The PhD student will be supported in building international collaborations with researchers across different disciplines and in developing effective research communication skills.

Exploring Hybrid Flood modelling leveraging GPU/Exascale computing (PhD)

Supervisors: Andrew Elliott, Lindsay Beevers (University of Edinburgh), Claire MillerMichele Weiland (University of Edinburgh)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Flood modelling is crucial for understanding flood hazards, now and in the future as a result of climate change. Modelling provides inundation extents (or flood footprints) which provide outlines of areas at risk which can help to manage our increasingly complex infrastructure network as our climate changes. Our ability to make fast, accurate predictions of fluvial inundation extents is important for disaster risk reduction. Simultaneously capturing uncertainty in forecasts or predictions is essential for efficient planning and design. Both aims require methods which are computationally efficient whilst maintaining accurate predictions. Current Navier-stokes physics-based models are computationally intensive; thus this project would explore approaches to hybrid flood models which utilise GPU-compute and ML fused with physics-based models, as well as investigating scaling the numerical models to large-scale HPC resources.

Scalable approaches to mathematical modelling and uncertainty quantification in heterogeneous peatlands (PhD)

Supervisors: Raimondo Penta, Vinny Davies, Jessica Davies (Lancaster University), Lawrence BullMatteo Icardi (University of Nottingham)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification, Continuum Mechanics
Funding: This project is competitively funded through the ExaGEO DLA.

While only covering 3% of the Earth’s surface, peatlands store >30% of terrestrial carbon and play a vital ecological role. Peatlands are, however, highly sensitive to climate change and human pressures, and therefore understanding and restoring them is crucial for climate action. Multiscale mathematical models can represent the complex microstructures and interactions that control peatland dynamics but are limited by their computational demands. GPU and Exascale computing advances offer a timely opportunity to unlock the potential benefits of mathematically-led peatland modelling approaches. By scaling these complex models to run on new architectures or by directly incorporating mathematical constraints into GPU-based deep learning approaches, scalable computing will to deliver transformative insights into peatland dynamics and their restoration, supporting global climate efforts.

Scalable Inference and Uncertainty Quantification for Ecosystem Modelling (PhD)

Supervisors: Vinny Davies, Richard Reeve (BOHVM, UoG), David Johnson (Lancaster University), Christina CobboldNeil Brummitt (Natural History Museum)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Understanding the stability of ecosystems and how they are impacted by climate and land use change can allow us to identify sites where biodiversity loss will occur and help to direct policymakers in mitigation efforts. Our current digital twin of plant biodiversity – https://github.com/EcoJulia/EcoSISTEM.jl – provides functionality for simulating species through processes of competition, reproduction, dispersal and death, as well as environmental changes in climate and habitat, but it would benefit from enhancement in several areas. The three this project would most likely target are the introduction of a soil layer (and the improvement of the modelling of soil water); improving the efficiency of the code to handle a more complex model and to allow stochastic and systematic Uncertainty Quantification (UQ); and developing techniques for scalable inference of missing parameters.

Smart-sensing for systems-level water quality monitoring (PhD)

Supervisors: Craig Wilkie, Lawrence Bull, Claire MillerStephen Thackeray (Lancaster University)
Relevant research groups: Machine Learning and AI, Emulation and Uncertainty Quantification, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

Freshwater systems are vital for sustaining the environment, agriculture, and urban development, yet in the UK, only 33% of rivers and canals meet ‘good ecological status’ (JNCC, 2024). Water monitoring is essential to mitigate the damage caused by pollutants (from agriculture, urban settlements, or waste treatment) and while sensors are increasingly affordable, coverage remains a significant issue. New techniques for edge processing and remote power offer one solution, providing alternative sources of telemetry data. However, methods which combine such information into systems-level sensing for water are not as mature as other applications (e.g., built environment). In response, procedures for computation at the edge, decision-making, and data/model interoperability are considerations of this project.

Statistical Emulation Development for Landscape Evolution Models (PhD)

Supervisors: Benn Macdonald, Mu Niu, Paul Eizenhöfer (GES, UoG), Eky Febrianto (Engineering, UoG)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Many real-world processes, including those governing landscape evolution, can be effectively mathematically described via differential equations. These equations describe how processes, e.g. the physiography of mountainous landscapes, change with respect to other variables, e.g. time and space. Conventional approaches for performing statistical inference involve repeated numerical solving of the equations. Every time parameters of the equations are changed in a statistical optimisation or sampling procedure; the equations need to be re-solved numerically. The associated large computational cost limits advancements when scaling to more complex systems, the application of statistical inference and machine learning approaches, as well as the implementation of more holistic approaches to Earth System science. This yields to the need for an accelerated computing paradigm involving highly parallelised GPUs for the evaluation of the forward problem.

Beyond advanced computing hardware, emulation is becoming a more popular way to tackle this issue. The idea is that first the differential equations are solved as many times as possible and then the output is interpolated using statistical techniques. Then, when inference is carried out, the emulator predictions replace the differential equation solutions. Since prediction from an emulator is very fast, this avoids the computational bottleneck. If the emulator is a good representation of the differential equation output, then parameter inference can be accurate.

The student will begin by working on parallelising the numerical solver of the mathematical model via GPUs. This means that many more solutions can be generated on which to build the emulator, in a timeframe that is feasible. Then, they will develop efficient emulators for complex landscape evolution models, as the PhD project evolves.

Structured ML for physical systems (PhD or MSc)

Supervisors: Lawrence Bull
Relevant research groups: Machine Learning and AI, Emulation and Uncertainty Quantification

When using Machine Learning (ML) for science and engineering, an alternative mindset is required to build sensible representations from data. Unlike other applications (e.g. large language models), the datasets are relatively small and curated - i.e. they are collected via experiments rather than scraped from the internet. The limited variance of training data typically renders learning by ‘brute force’ infeasible. Instead, we must encode domain-specific knowledge within ML algorithms to enforce structure and constrain the space of possible models.

This project covers ML for physical systems (Karniadakis, 2021) and looks to integrate machine learning with applied mathematics - fusing scientific knowledge with insights from data. Methods will investigate various levels of constraints on ML predictions - including smoothness, invariances, etc. Relevant topics include:

Medical image segmentation and uncertainty quantification (PhD)

Supervisors: Surajit Ray
Relevant research groups: Machine Learning and AIImaging, Image Processing and Image Analysis

This project focuses on the application of medical imaging and uncertainty quantification for the detection of tumours. The project aims to provide clinicians with accurate, non-invasive methods for detecting and classifying the presence of malignant and benign tumours. It seeks to combine advanced medical imaging technologies such as ultrasound, computed tomography (CT) and magnetic resonance imaging (MRI) with the latest artificial intelligence algorithms. These methods will automate the detection process and may be used for determining malignancy with a high degree of accuracy. Uncertainty quantification (UQ) techniques will help generate a more precise prediction for tumour malignancy by providing a characterisation of the degree of uncertainty associated with the diagnosis. The combination of medical imaging and UQ will significantly decrease the requirement for performing invasive medical procedures such as biopsies. This will improve the accuracy of the tumour detection process and reduce the duration of diagnosis. The project will also benefit from the development of novel image processing algorithms (e.g. deep learning) and machine learning models. These algorithms and models will help improve the accuracy of the tumour detection process and assist clinicians in making the best treatment decisions.

Generating deep fake left ventricles: a step towards personalised heart treatments (PhD)

Supervisors: Andrew Elliott, Vinny Davies, Hao Gao
Relevant research groups: Machine Learning and AI, Emulation and Uncertainty Quantification, Biostatistics, Epidemiology and Health Applications, Imaging, Image Processing and Image Analysis

Personalised medicine is an exciting avenue in the field of cardiac healthcare where an understanding of patient-specific mechanisms can lead to improved treatments (Gao et al., 2017). The use of mathematical models to link the underlying properties of the heart with cardiac imaging offers the possibility of obtaining important parameters of heart function non-invasively (Gao et al., 2015). Unfortunately, current estimation methods rely on complex mathematical forward simulations, resulting in a solution taking hours, a time frame not suitable for real-time treatment decisions. To increase the applicability of these methods, statistical emulation methods have been proposed as an efficient way of estimating the parameters (Davies et al., 2019Noè et al., 2019). In this approach, simulations of the mathematical model are run in advance and then machine learning based methods are used to estimate the relationship between the cardiac imaging and the parameters of interest. These methods are, however, limited by our ability to understand the how cardiac geometry varies across patients which is in term limited by the amount of data available (Romaszko et al., 2019). In this project we will look at AI based methods for generating fake cardiac geometries which can be used to increase the amount of data (Qiao et al., 2023). We will explore different types of AI generation, including Generative Adversarial Networks or Variational Autoencoders, to understand how we can generate better 3D and 4D models of the fake left ventricles and create an improved emulation strategy that can make use of them.

Modelling in Space and Time - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Collective animal movement and resource selection in changing environments (PhD)

Supervisors: Mu Nui, Paul Blackwell (Sheffield), Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Advances in technologies such as GPS have revolutionized the tracking of wildlife, providing detailed data on how animals move and interact. It is now feasible to track multiple animals simultaneously, at high frequency and for long periods. This project explores the movement of animals both individually and in groups, focusing on how environmental factors and resource availability shape their behaviours. Animals in groups often move interdependently, influenced by interactions between individuals. However, traditional movement models primarily address individual animals and ignore these group dynamics. On the other hand, collective movement models are often parameterized with short-term data. This research will develop innovative statistical models to better understand how animals move collectively. Such movement necessarily involves each individual responding to their physical environment, as well as other group members, and a key aspect of this project is understanding how animals use space and resources within the group setting. Incorporating both these aspects of short-term movement decisions and long-term space use in a coherent mathematical model will illuminate how animals collectively adapt to their surroundings. It will use cutting-edge statistical and machine learning methods, such as diffusion models. The findings and methodology developed will provide valuable insights into animal behaviour and ecology, supporting conservation efforts and helping manage human impacts on wildlife.

Extreme value theory for predicting animal dispersal and movement in a changing climate (PhD)

Supervisors: Jafet Belmont Osuna, Daniela Castro-Camilo, Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

There is an imperative need to understand and predict how populations respond to multiple aspects of global change, such as habitat fragmentation and climate change. Extreme weather events, which are expected to increase in both frequency and intensity, can profoundly impact animal movement and spatial dynamics. Additionally, for many species, rare long-distance dispersal events play a crucial role in reaching suitable habitats for germination, establishment, and colonisation across fragmented or managed landscapes. Many plant species, for instance, rely on birds for dispersal—birds ingest fruits and later deposit seeds through defecation or regurgitation. Accurately predicting such processes requires models that capture both seed retention times within birds and bird movement patterns. This project aims to develop and apply cutting-edge statistical methods for analysing animal movement and dispersal data using Extreme-Value Theory (EVT) within a Bayesian framework. EVT, a well-established theoretical framework that has been widely used in environmental sciences for modelling extreme events, has seen limited application in ecology. We will leverage EVT to (1) understand how extreme weather events can affect animal movement, and (2) to make better predictions of dispersal processes. This work offers substantial potential for novel insights and methodological advancements. By integrating experimental research and state-of-the-art tracking technologies, the project will inform the development of hierarchical Bayesian models to explore patterns and drivers of animal movement and dispersal, with a particular focus on extreme behaviours and their ecological implications.

Leveraging large language models to provide insights into global plant biodiversity (PhD)

Supervisors: Richard Reeve (MVLS, UoG), Jake Lever (CS, UoG), Vinny Davies, Neil Brummitt (NHM), Ana Claudia Araujo (NHM)Ben Scott (NHM)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Plants are fundamental to the provision of ecosystem services, and we are wholly dependent on them for survival. Yet, globally, many plant species are under threat of extinction. We need a comprehensive plant trait dataset as input to the next generation of biodiversity-climate models. The lack of such a dataset means that existing approaches focus on limited "Plant Functional Types" and cannot estimate the impacts of climate and land use change on individual species or help inform decision making on mitigating biodiversity loss. The needed plant trait data, from niche preferences to growth rates, are locked in the text of the vast botanical literature of the Biodiversity Heritage Library and other texts available to the Natural History Museum. This studentship would use the recent advances in large language models (LLMs) and natural language processing (NLP) to extract this information. We have developed an ecosystem modelling tool (EcoSISTEM, Harris et al., 2023, https://github.com/EcoJulia/EcoSISTEM.jl) that captures survival, competition and reproduction among multiple plant species across a landscape. LLMs will enable extraction of traits data for integration into the EcoSISTEM infrastructure and enable the inclusion of multilingual records, expanding the system's geographic and historical range. By addressing these enormous data gaps, the student will then explore global spatial and temporal variability in functional and other trait-based diversity measures to produce a unique and comprehensive evaluation of whether predictors exist of diversity at a global scale. Ultimately, the project will boost EcoSISTEM's ability to simulate plant responses to climate change with greater accuracy.

The impact of deep learning optimization and design choices for marine biodiversity monitoring (PhD)

Supervisors: Tiffany VlaarLaurence De Clippele (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

This project aims to increase the efficiency, accuracy, and reliability of annotation and classification of large marine datasets using deep learning. Timely and accurate analysis of these long-term datasets will aid marine biodiversity monitoring efforts. Design of more efficient strategies further aims to reduce the carbon footprint of training and fine-tuning large machine learning models. The project is expected to lead to various novel insights for the machine learning community such as on optimal pre-training choices for downstream robust performance, the optimal order of learning samples with varying complexity levels, navigating instances with label ground truth uncertainty, and re-evaluation of metric design. The PhD student will be supported in building international collaborations with researchers across different disciplines and in developing effective research communication skills.

Sampling strategies for environmental monitoring networks (PhD)

Supervisors: Claire MillerCraig AlexanderCraig Wilkie
Relevant research groups: Modelling in Space and TimeEnvironmental, Ecological, and Sustainability
Funding: This project has specific funding available. More information can be found at the FindAPhD Advert.

In recent years, there has been a lot of work done on investigating how to monitor environmental variables in the most efficient way.  Environmental variables, such as pollutants in water, can be monitored through, for example, in-situ sampling, automatic in-situ sensors or remote sensing.  However, each sampling approach has different levels of accuracy and is available at different spatial and temporal resolutions. 

Environmental regulators and industry all have a responsibility and commitment to monitoring environmental standards and mitigating the potential for increases to levels of pollutants.  At a time of world-wide budgetary pressures, the most efficient monitoring schemes are required.  However, the mechanisms of monitoring can also be detrimental to the environment e.g. through more visits to a site or lab/computer processing creating a higher environmental footprint.

The aim of the PhD is to extend work already carried out on the optimal design of monitoring networks for spatiotemporal models. Specifically, to identify the spatiotemporal sampling designs that can balance budgetary requirements and environmental impact, with a view to developing and enhancing online tools (e.g. GWSDAT) to provide automatic guidance to practitioners. The latter can then integrate this guidance into their assessment and development of the most optimal monitoring network. This will require statistical methodological development, working on computationally efficient implementations and software development.

The PhD will be jointly supervised by partners from industry and hence the successful candidate will additionally engage in knowledge exchange/transfer, training and networking in this sector.

Evaluating probabilistic forecasts in high-dimensional settings (PhD)

Supervisors: Jethro Browell
Relevant research groups: Modelling in Space and Time, Computational Statistics, Applied Probability and Stochastic Processes

Many decisions are informed by forecasts, and almost all forecasts are uncertain to some degree. Probabilistic forecasts quantify uncertainty to help improve decision-making and are playing an important role in fields including weather forecasting, economics, energy, and public policy. Evaluating the quality of past forecasts is essential to give forecasters and forecast users confidence in their current predictions, and to compare the performance of forecasting systems.

While the principles of probabilistic forecast evaluation have been established over the past 15 years, most notably that of “sharpness subject to calibration/reliability”, we lack a complete toolkit for applying these principles in many situations, especially those that arise in high-dimensional settings. Furthermore, forecast evaluation must be interpretable by forecast users as well as expert forecasts, and assigning value to marginal improvements in forecast quality remains a challenge in many sectors.

This PhD will develop new statistical methods for probabilistic forecast evaluation considering some of the following issues:

  • Verifying probabilistic calibration conditional on relevant covariates
  • Skill scores for multivariate probabilistic forecasts where “ideal” performance is unknowable
  • Assigning value to marginal forecast improvement though the convolution of utility functions and Murphey Diagrams
  • Development of the concept of “anticipated verification” and “predicting the of uncertainty of future forecasts”
  • Decomposing forecast misspecification (e.g. into spatial and temporal components)
  • Evaluation of Conformal Predictions

Good knowledge of multivariate statistics is essential, prior knowledge of probabilistic forecasting and forecast evaluation would be an advantage.

Adaptive probabilistic forecasting (PhD)

Supervisors: Jethro Browell
Relevant research groups: Modelling in Space and Time, Computational Statistics, Applied Probability and Stochastic Processes

Data-driven predictive models depend on the representativeness of data used in model selection and estimation. However, many processes change over time meaning that recent data is more representative than old data. In this situation, predictive models should track these changes, which is the aim of “online” or “adaptive” algorithms. Furthermore, many users of forecasts require probabilistic forecasts, which quantify uncertainty, to inform their decision-making. Existing adaptive methods such as Recursive Least Squares, the Kalman Filter have been very successful for adaptive point forecasting, but adaptive probabilistic forecasting has received little attention. This PhD will develop methods for adaptive probabilistic forecasting from a theoretical perspective and with a view to apply these methods to problems in at least one application area to be determined.

In the context of adaptive probabilistic forecasting, this PhD may consider:

  • Online estimation of Generalised Additive Models for Location Scale and Shape
  • Online/adaptive (multivariate) time series prediction
  • Online aggregation (of experts, or hierarchies)

A good knowledge of methods for time series analysis and regression is essential, familiarity with flexible regression (GAMs) and distributional regression (GAMLSS/quantile regression) would be an advantage.

The evolution of shape (PhD)

Supervisors: Vincent Macaulay
Relevant research groups: Bayesian Modelling and Inference, Modelling in Space and Time, Statistical Modelling for Biology, Genetics and *omics

Shapes of objects change in time. Organisms evolve and in the process change form: humans and chimpanzees derive from some common ancestor presumably different from either in shape. Designed objects are no different: an Art Deco tea pot from the 1920s might share some features with one from Ikea in 2010, but they are different. Mathematical models of evolution for certain data types, like the strings of As, Gs , Cs and Ts in our evolving DNA, are quite mature and allow us to learn about the relationships of the objects (their phylogeny or family tree), about the changes that happen to them in time (the evolutionary process) and about the ways objects were configured in the past (the ancestral states), by statistical techniques like phylogenetic analysis. Such techniques for shape data are still in their infancy. This project will develop novel statistical inference approaches (in a Bayesian context) for complex data objects, like functions, surfaces and shapes, using Gaussian-process models, with potential application in fields as diverse as language evolution, morphometrics and industrial design.

New methods for analysis of migratory navigation (PhD)

Supervisors: Janine IllianUrška Demšar (University of St Andrews)
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Computational Statistics, Environmental, Ecological Sciences and Sustainability

Migratory birds travel annually across vast expanses of oceans and continents to reach their destination with incredible accuracy. How they are able to do this using only locally available cues is still not fully understood. Migratory navigation consists of two processes: birds either identify the direction in which to fly (compass orientation) or the location where they are at a specific moment in time (geographic positioning). One of the possible ways they do this is to use information from the Earth’s magnetic field in the so-called geomagnetic navigation (Mouritsen, 2018). While there is substantial evidence (both physiological and behavioural) that they do sense magnetic field (Deutschlander and Beason, 2014), we however still do not know exactly which of the components of the field they use for orientation or positioning. We also do not understand how rapid changes in the field affect movement behaviour.

There is a possibility that birds can sense these rapid large changes and that this may affect their navigational process. To study this, we need to link accurate data on Earth’s magnetic field with animal tracking data. This has only become possible very recently through new spatial data science advances:  we developed the MagGeo tool, which links contemporaneous geomagnetic data from Swarm satellites of the European Space Agency with animal tracking data (Benitez Paez et al., 2021).

Linking geomagnetic data to animal tracking data however creates a highly-dimensional data set, which is difficult to explore. Typical analyses of contextual environmental information in ecology include representing contextual variables as co-variates in relatively simple statistical models (Brum Bastos et al., 2021), but this is not sufficient for studying detailed navigational behaviour. This project will analyse complex spatio-temporal data using computationally efficient statistical model fitting approches in a Bayesian context.

This project is fully based on open data to support reproducibility and open science. We will test our new methods by annotating publicly available bird tracking data (e.g. from repositories such as Movebank.org), using the open MagGeo tool and implementing our new methods as Free and Open Source Software (R/Python).

Integrated spatio-temporal modelling for environmental data (PhD)

Supervisors: Janine IllianPeter Henrys (UKCEH)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

The last decade has seen a proliferation of environmental data with vast quantities of information available from various sources. This has been due to a number of different factors including: the advent of sensor technologies; the provision of remotely sensed data from both drones and satellites; and the explosion in citizen science initiatives. These data represent a step change in the resolution of available data across space and time - sensors can be streaming data at a resolution of seconds whereas citizen science observations can be in the hundreds of thousands.  

Over the same period, the resources available for traditional field surveys have decreased dramatically whilst logistical issues (such as access to sites, ) have increased. This has severely impacted the ability for field survey campaigns to collect data at high spatial and temporal resolutions. It is exactly this sort of information that is required to fit models that can quantify and predict the spread of invasive species, for example. 

Whilst we have seen an explosion of data across various sources, there is no single source that provides both the spatial and temporal intensity that may be required when fitting complex spatio-temporal models (cf invasive species example) - each has its own advantages and benefits in terms of information content. There is therefore potentially huge benefit in beginning together data from these different sources within a consistent framework to exploit the benefits each offers and to understand processes at unprecedented resolutions/scales that would be impossible to monitor. 

Current approaches to combining data in this way are typically very bespoke and involve complex model structures that are not reusable outside of the particular application area. What is needed is an overarching generic methodological framework and associated software solutions to implement such analyses. Not only would such a framework provide the methodological basis to enable researchers to benefit from this big data revolution, but also the capability to change such analyses from being stand alone research projects in their own right, to more operational, standard analytical routines. 

FInally, such dynamic, integrated analyses could feedback into data collection initiatives to ensure optimal allocation of effort for traditional surveys or optimal power management for sensor networks. The major step change being that this optimal allocation of effort is conditional on other data that is available. So, for example, given the coverage and intensity of the citizen science data, where should we optimally send our paid surveyors? The idea is that information is collected at times and locations that provide the greatest benefit in understanding the underpinning stochastic processes. These two major issues - integrated analyses and adaptive sampling - ensure that environmental monitoring is fit for purpose and scientists, policy and industry can benefit from the big data revolution. 

This project will develop an integrated statistical modelling strategy that provides a single modelling framework for enabling quantification of ecosystem goods and services while accounting for the fundamental differences in different data streams. Data collected at different spatial resolutions can be used within the same model through projecting it into continuous space and projecting it back into the landscape level of interest.  As a result, decisions can be made at the relevant spatial scale and uncertainty is propagated through, facilitating appropriate decision making.

Statistical methodology for assessing the impacts of offshore renewable developments on marine wildlife (PhD)

Supervisors: Janine IllianEsther Jones (BIOSS)Adam Butler (BIOSS)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Assessing the impacts of offshore renewable developments on marine wildlife is a critical component of the consenting process. A NERC-funded project, ECOWINGS, will provide a step-change in analysing predator-prey dynamics in the marine environment, collecting data across trophic levels against a backdrop of developing wind farms and climate change. Aerial survey and GPS data from multiple species of seabirds will be collected contemporaneously alongside prey data available over the whole water column from an automated surface vehicle and underwater drone.

These methods of data collection will generate 3D space and time profiles of predators and prey, creating a rich source of information and enormous potential for modelling and interrogation. The data present a unique opportunity for experimental design across a dynamic and changing marine ecosystem, which is heavily influenced by local and global anthropogenic activities. However, these data have complex intrinsic spatio-temporal properties, which are challenging to analyse. Significant statistical methods development could be achieved using this system as a case study, contributing to the scientific knowledge base not only in offshore renewables but more generally in the many circumstances where patchy ecological spatio-temporal data are available. 

This PhD project will develop spatio-temporal modelling methodology that will allow user to anaylse these exciting - and complex - data sets and help inform our knowledge on the impact of off-shore renewable on wildlife. 

Analysis of spatially correlated functional data objects (PhD)

Supervisors: Surajit Ray
Relevant research groups: Modelling in Space and TimeComputational StatisticsNonparametric and Semi-parametric StatisticsImaging, Image Processing and Image Analysis

Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which the student will work on:

  • robust statistical models for incorporating temporal and spatial dependencies in functional data
  • developing reliable prediction and interpolation techniques for dependent functional data
  • developing inferential framework for testing hypotheses related to simplified dependent structures
  • analysing sparsely observed functional data by borrowing information from neighbours
  • visualisation of data summaries associated with dependent functional data
  • Clustering of functional data

Estimating the effects of air pollution on human health (PhD)

Supervisors: Duncan Lee
Relevant research groups: Modelling in Space and TimeBiostatistics, Epidemiology and Health Applications

The health impact of exposure to air pollution is thought to reduce average life expectancy by six months, with an estimated equivalent health cost of 19 billion each year (from DEFRA). These effects have been estimated using statistical models, which quantify the impact on human health of exposure in both the short and the long term. However, the estimation of such effects is challenging, because individual level measures of health and pollution exposure are not available. Therefore, the majority of studies are conducted at the population level, and the resulting inference can only be made about the effects of pollution on overall population health. However, the data used in such studies are spatially misaligned, as the health data relate to extended areas such as cities or electoral wards, while the pollution concentrations are measured at individual locations. Furthermore, pollution monitors are typically located where concentrations are thought to be highest, known as preferential sampling, which is likely to result in overly high measurements being recorded. This project aims to develop statistical methodology to address these problems, and thus provide a less biased estimate of the effects of pollution on health than are currently produced.

Mapping disease risk in space and time (PhD)

Supervisors: Duncan Lee
Relevant research groups: Modelling in Space and TimeBiostatistics, Epidemiology and Health Applications

Disease risk varies over space and time, due to similar variation in environmental exposures such as air pollution and risk inducing behaviours such as smoking.  Modelling the spatio-temporal pattern in disease risk is known as disease mapping, and the aims are to: quantify the spatial pattern in disease risk to determine the extent of health inequalities,  determine whether there has been any increase or reduction in the risk over time, identify the locations of clusters of areas at elevated risk, and quantify the impact of exposures, such as air pollution, on disease risk. I am working on all these related problems at present, and I have PhD projects in all these areas.

Bayesian Mixture Models for Spatio-Temporal Data (PhD)

Supervisors: Craig Anderson
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Biostatistics, Epidemiology and Health Applications

The prevalence of disease is typically not constant across space – instead the risk tends to vary from one region to another.  Some of this variability may be down to environmental conditions, but many of them are driven by socio-economic differences between regions, with poorer regions tending to have worse health than wealthier regions.  For example, within the the Greater Glasgow and Clyde region, where the World Health Organisation noted that life expectancy ranges from 54 in Calton to 82 in Lenzie, despite these areas being less than 10 miles apart. There is substantial value to health professionals and policymakers in identifying some of the causes behind these localised health inequalities.

Disease mapping is a field of statistical epidemiology which focuses on estimating the patterns of disease risk across a geographical region. The main goal of such mapping is typically to identify regions of high disease risk so that relevant public health interventions can be made. This project involves the development of statistical models which will enhance our understanding regional differences in the risk of suffering from major diseases by focusing on these localised health inequalities.

Standard Bayesian hierarchical models with a conditional autoregressive prior are frequently used for risk estimation in this context, but these models assume a smooth risk surface which is often not appropriate in practice. In reality, it will often be the case that different regions have vastly different risk profiles and require different data generating functions as a result.

In this work we propose a mixture model based approach which allows different sub-populations to be represented by different underlying statistical distributions within a single modelling framework. By integrating CAR models into mixture models, researchers can simultaneously account for spatial dependencies and identify distinct disease patterns within subpopulations.

Detecting hotspots of water pollution in complex constrained domains and networks (PhD)

Supervisors: Mu Niu, Craig Wilkie, Cathy Yi-Hsuan Chen (Business School, UofG)Michael Tso (Lancaster University)
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

Technological developments with smart sensors are changing the way that the environment is monitored.  Many such smart systems are under development, with small, energy efficient, mobile sensors being trialled.  Such systems offer opportunities to change how we monitor the environment, but this requires additional statistical development in the optimisation of the location of the sensors.

The aim of this project is to develop a mathematical and computational inferential framework to identify optimal sensor deployment locations within complex, constrained domains and networks for improved water contamination detection. Methods for estimating covariance functions in such domains rely on computationally intensive diffusion process simulations, limiting their application to relatively simple domains and small-scale datasets. To address this challenge, the project will employ accelerated computing paradigms with highly parallelized GPUs to enhance simulation efficiency. The framework will also address regression, classification, and optimization problems on latent manifolds embedded in high-dimensional spaces, such as image clouds (e.g., remote sensing satellite images), which are crucial for sensor deployment and performance evaluation. As the project progresses, particularly in the image cloud case, the computational demands will intensify, requiring advanced GPU resources or exascale computing to ensure scalability, efficiency, and performance.

Downscaling and Prediction of Rainfall Extremes from Climate Model Outputs (PhD)

Supervisors: Sebastian Gerhard Mutz (GES, UoG), Daniela Castro-Camilo
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA

In the last decade, Scotland’s rainfall increased by 9% annually and 19% in winter, with more water from extreme events, posing risks to the environment, infrastructure, health, and industry. Urgent issues such as flooding, mass wasting, and water quality are closely tied to rainfall extremes. Reliable predictions of extremes are, therefore, critical for risk management. Prediction of extremes, which is one of the main focuses of extreme value theory, is still considered one of the grand challenges by the World Climate Research Programme. This project will address this challenge by developing novel statistical, computationally efficient models that are able to predict rainfall extremes from the output of GPU-optimised climate models.

Exploring Hybrid Flood modelling leveraging GPU/Exascale computing (PhD)

Supervisors: Andrew Elliott, Lindsay Beevers (University of Edinburgh), Claire MillerMichele Weiland (University of Edinburgh)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Flood modelling is crucial for understanding flood hazards, now and in the future as a result of climate change. Modelling provides inundation extents (or flood footprints) which provide outlines of areas at risk which can help to manage our increasingly complex infrastructure network as our climate changes. Our ability to make fast, accurate predictions of fluvial inundation extents is important for disaster risk reduction. Simultaneously capturing uncertainty in forecasts or predictions is essential for efficient planning and design. Both aims require methods which are computationally efficient whilst maintaining accurate predictions. Current Navier-stokes physics-based models are computationally intensive; thus this project would explore approaches to hybrid flood models which utilise GPU-compute and ML fused with physics-based models, as well as investigating scaling the numerical models to large-scale HPC resources.

Scalable approaches to mathematical modelling and uncertainty quantification in heterogeneous peatlands (PhD)

Supervisors: Raimondo Penta, Vinny Davies, Jessica Davies (Lancaster University), Lawrence BullMatteo Icardi (University of Nottingham)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification, Continuum Mechanics
Funding: This project is competitively funded through the ExaGEO DLA.

While only covering 3% of the Earth’s surface, peatlands store >30% of terrestrial carbon and play a vital ecological role. Peatlands are, however, highly sensitive to climate change and human pressures, and therefore understanding and restoring them is crucial for climate action. Multiscale mathematical models can represent the complex microstructures and interactions that control peatland dynamics but are limited by their computational demands. GPU and Exascale computing advances offer a timely opportunity to unlock the potential benefits of mathematically-led peatland modelling approaches. By scaling these complex models to run on new architectures or by directly incorporating mathematical constraints into GPU-based deep learning approaches, scalable computing will to deliver transformative insights into peatland dynamics and their restoration, supporting global climate efforts.

Scalable Inference and Uncertainty Quantification for Ecosystem Modelling (PhD)

Supervisors: Vinny Davies, Richard Reeve (BOHVM, UoG), David Johnson (Lancaster University), Christina CobboldNeil Brummitt (Natural History Museum)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Understanding the stability of ecosystems and how they are impacted by climate and land use change can allow us to identify sites where biodiversity loss will occur and help to direct policymakers in mitigation efforts. Our current digital twin of plant biodiversity – https://github.com/EcoJulia/EcoSISTEM.jl – provides functionality for simulating species through processes of competition, reproduction, dispersal and death, as well as environmental changes in climate and habitat, but it would benefit from enhancement in several areas. The three this project would most likely target are the introduction of a soil layer (and the improvement of the modelling of soil water); improving the efficiency of the code to handle a more complex model and to allow stochastic and systematic Uncertainty Quantification (UQ); and developing techniques for scalable inference of missing parameters.

Statistical Emulation Development for Landscape Evolution Models (PhD)

Supervisors: Benn Macdonald, Mu Niu, Paul Eizenhöfer (GES, UoG), Eky Febrianto (Engineering, UoG)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Many real-world processes, including those governing landscape evolution, can be effectively mathematically described via differential equations. These equations describe how processes, e.g. the physiography of mountainous landscapes, change with respect to other variables, e.g. time and space. Conventional approaches for performing statistical inference involve repeated numerical solving of the equations. Every time parameters of the equations are changed in a statistical optimisation or sampling procedure; the equations need to be re-solved numerically. The associated large computational cost limits advancements when scaling to more complex systems, the application of statistical inference and machine learning approaches, as well as the implementation of more holistic approaches to Earth System science. This yields to the need for an accelerated computing paradigm involving highly parallelised GPUs for the evaluation of the forward problem.

Beyond advanced computing hardware, emulation is becoming a more popular way to tackle this issue. The idea is that first the differential equations are solved as many times as possible and then the output is interpolated using statistical techniques. Then, when inference is carried out, the emulator predictions replace the differential equation solutions. Since prediction from an emulator is very fast, this avoids the computational bottleneck. If the emulator is a good representation of the differential equation output, then parameter inference can be accurate.

The student will begin by working on parallelising the numerical solver of the mathematical model via GPUs. This means that many more solutions can be generated on which to build the emulator, in a timeframe that is feasible. Then, they will develop efficient emulators for complex landscape evolution models, as the PhD project evolves.

Nonparametric and Semi-parametric Statistics - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Causal inference in noisy social networks (PhD)

Supervisors: Vanessa McNealis
Relevant research groups: Nonparametric and Semi-parametric StatisticsBiostatistics, Epidemiology and Health Applications, Social and Urban Studies

One core task of science is causal inference, yet distinguishing causality from spurious associations in observational data can be challenging. Statistical causal inference provides a framework to define causal effects, specify assumptions for identifying causal effects, and assess sensitivity of causal estimators to these assumptions.

Recent interest has focused on causal inference under interference (or spillover), where one individual’s treatment affects the outcomes of others. Social network data are particularly valuable for this purpose, as they offer information about connections between individuals, revealing potential pathways for interference. For instance, in the National Longitudinal Study of Adolescent Health (Add Health), peer influences among adolescents provide an ideal case for studying spillover, especially as they relate to behavioural and academic outcomes. However, one challenge for Add Health is the very high level of missing edge variable data and censoring present, posing challenges since many methods for evaluating spillover effects assume fully observed networks.

This PhD will develop statistical methods for causal inference under network interference with noise, considering the following issues/approaches:

  • Bias characterization in the presence of missing or uncertain edge information
  • Semi-parametric inference
  • Propensity score methods
  • Multiple imputation for network data

A good knowledge of methods for survey sampling and regression is essential, familiarity with causal inference, statistical methods for coarse data, and semi-parametric inference would be an advantage.

Analysis of spatially correlated functional data objects (PhD)

Supervisors: Surajit Ray
Relevant research groups: Modelling in Space and TimeComputational StatisticsNonparametric and Semi-parametric StatisticsImaging, Image Processing and Image Analysis

Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which the student will work on:

  • robust statistical models for incorporating temporal and spatial dependencies in functional data
  • developing reliable prediction and interpolation techniques for dependent functional data
  • developing inferential framework for testing hypotheses related to simplified dependent structures
  • analysing sparsely observed functional data by borrowing information from neighbours
  • visualisation of data summaries associated with dependent functional data
  • Clustering of functional data

Modality of mixtures of distributions (PhD)

Supervisors: Surajit Ray
Relevant research groups: Nonparametric and Semi-parametric StatisticsApplied Probability and Stochastic ProcessesStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modeling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. In this project, we wish to explore the modality of non-normal distributions and their application to real life problems.

Interdisciplinary Impact

Biostatistics, Epidemiology and Health Applications - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Causal inference in noisy social networks (PhD)

Supervisors: Vanessa McNealis
Relevant research groups: Nonparametric and Semi-parametric StatisticsBiostatistics, Epidemiology and Health ApplicationsSocial and Urban Studies

One core task of science is causal inference, yet distinguishing causality from spurious associations in observational data can be challenging. Statistical causal inference provides a framework to define causal effects, specify assumptions for identifying causal effects, and assess sensitivity of causal estimators to these assumptions.

Recent interest has focused on causal inference under interference (or spillover), where one individual’s treatment affects the outcomes of others. Social network data are particularly valuable for this purpose, as they offer information about connections between individuals, revealing potential pathways for interference. For instance, in the National Longitudinal Study of Adolescent Health (Add Health), peer influences among adolescents provide an ideal case for studying spillover, especially as they relate to behavioural and academic outcomes. However, one challenge for Add Health is the very high level of missing edge variable data and censoring present, posing challenges since many methods for evaluating spillover effects assume fully observed networks.

This PhD will develop statistical methods for causal inference under network interference with noise, considering the following issues/approaches:

  • Bias characterization in the presence of missing or uncertain edge information
  • Semi-parametric inference
  • Propensity score methods
  • Multiple imputation for network data

A good knowledge of methods for survey sampling and regression is essential, familiarity with causal inference, statistical methods for coarse data, and semi-parametric inference would be an advantage.

Bayesian statistical data integration of single-cell and bulk “OMICS” datasets with clinical parameters for accurate prediction of treatment outcomes in Rheumatoid Arthritis (PhD)

Supervisors: Mayetri Gupta
Relevant research groups: Bayesian Modelling and InferenceComputational StatisticsVincent MacaulayBiostatistics, Epidemiology and Health Applications

In recent years, many different computational methods to analyse biological data have been established: including DNA (Genomics), RNA (Transcriptomics), Proteins (proteomics) and Metabolomics, that captures more dynamic events. These methods were refined by the advent of single cell technology, where it is now possible to capture the transcriptomics profile of single cells, spatial arrangements of cells from flow methods or imaging methods like functional magnetic resonance imaging. At the same time, these OMICS data can be complemented with clinical data – measurement of patients, like age, smoking status, phenotype of disease or drug treatment. It is an interesting and important open statistical question how to combine data from different “modalities” (like transcriptome with clinical data or imaging data) in a statistically valid way, to compare different datasets and make justifiable statistical inferences. This PhD project will be jointly supervised with Dr. Thomas Otto and Prof. Stefan Siebert from the Institute of Infection, Immunity & Inflammation), you will explore how to combine different datasets using Bayesian latent variable modelling, focusing on clinical datasets from Rheumatoid Arthritis.

Modality of mixtures of distributions (PhD)

Supervisors: Surajit Ray
Relevant research groups: Nonparametric and Semi-parametric StatisticsApplied Probability and Stochastic ProcessesStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modeling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. In this project, we wish to explore the modality of non-normal distributions and their application to real life problems

Estimating the effects of air pollution on human health (PhD)

Supervisors: Duncan Lee
Relevant research groups: Modelling in Space and TimeBiostatistics, Epidemiology and Health Applications

The health impact of exposure to air pollution is thought to reduce average life expectancy by six months, with an estimated equivalent health cost of 19 billion each year (from DEFRA). These effects have been estimated using statistical models, which quantify the impact on human health of exposure in both the short and the long term. However, the estimation of such effects is challenging, because individual level measures of health and pollution exposure are not available. Therefore, the majority of studies are conducted at the population level, and the resulting inference can only be made about the effects of pollution on overall population health. However, the data used in such studies are spatially misaligned, as the health data relate to extended areas such as cities or electoral wards, while the pollution concentrations are measured at individual locations. Furthermore, pollution monitors are typically located where concentrations are thought to be highest, known as preferential sampling, which is likely to result in overly high measurements being recorded. This project aims to develop statistical methodology to address these problems, and thus provide a less biased estimate of the effects of pollution on health than are currently produced.

Mapping disease risk in space and time (PhD)

Supervisors: Duncan Lee
Relevant research groups: Modelling in Space and TimeBiostatistics, Epidemiology and Health Applications

Disease risk varies over space and time, due to similar variation in environmental exposures such as air pollution and risk inducing behaviours such as smoking.  Modelling the spatio-temporal pattern in disease risk is known as disease mapping, and the aims are to: quantify the spatial pattern in disease risk to determine the extent of health inequalities,  determine whether there has been any increase or reduction in the risk over time, identify the locations of clusters of areas at elevated risk, and quantify the impact of exposures, such as air pollution, on disease risk. I am working on all these related problems at present, and I have PhD projects in all these areas.

Generating deep fake left ventricles: a step towards personalised heart treatments (PhD)

Supervisors: Andrew Elliott, Vinny Davies, Hao Gao
Relevant research groups: Machine Learning and AIEmulation and Uncertainty QuantificationBiostatistics, Epidemiology and Health ApplicationsStatistical Modelling for Biology, Genetics and *omics

Personalised medicine is an exciting avenue in the field of cardiac healthcare where an understanding of patient-specific mechanisms can lead to improved treatments (Gao et al., 2017). The use of mathematical models to link the underlying properties of the heart with cardiac imaging offers the possibility of obtaining important parameters of heart function non-invasively (Gao et al., 2015). Unfortunately, current estimation methods rely on complex mathematical forward simulations, resulting in a solution taking hours, a time frame not suitable for real-time treatment decisions. To increase the applicability of these methods, statistical emulation methods have been proposed as an efficient way of estimating the parameters (Davies et al., 2019Noè et al., 2019). In this approach, simulations of the mathematical model are run in advance and then machine learning based methods are used to estimate the relationship between the cardiac imaging and the parameters of interest. These methods are, however, limited by our ability to understand the how cardiac geometry varies across patients which is in term limited by the amount of data available (Romaszko et al., 2019). In this project we will look at AI based methods for generating fake cardiac geometries which can be used to increase the amount of data (Qiao et al., 2023). We will explore different types of AI generation, including Generative Adversarial Networks or Variational Autoencoders, to understand how we can generate better 3D and 4D models of the fake left ventricles and create an improved emulation strategy that can make use of them.

Bayesian Mixture Models for Spatio-Temporal Data (PhD)

Supervisors: Craig Anderson
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Biostatistics, Epidemiology and Health Applications

The prevalence of disease is typically not constant across space – instead the risk tends to vary from one region to another.  Some of this variability may be down to environmental conditions, but many of them are driven by socio-economic differences between regions, with poorer regions tending to have worse health than wealthier regions.  For example, within the the Greater Glasgow and Clyde region, where the World Health Organisation noted that life expectancy ranges from 54 in Calton to 82 in Lenzie, despite these areas being less than 10 miles apart. There is substantial value to health professionals and policymakers in identifying some of the causes behind these localised health inequalities.

Disease mapping is a field of statistical epidemiology which focuses on estimating the patterns of disease risk across a geographical region. The main goal of such mapping is typically to identify regions of high disease risk so that relevant public health interventions can be made. This project involves the development of statistical models which will enhance our understanding regional differences in the risk of suffering from major diseases by focusing on these localised health inequalities.

Standard Bayesian hierarchical models with a conditional autoregressive prior are frequently used for risk estimation in this context, but these models assume a smooth risk surface which is often not appropriate in practice. In reality, it will often be the case that different regions have vastly different risk profiles and require different data generating functions as a result.

In this work we propose a mixture model based approach which allows different sub-populations to be represented by different underlying statistical distributions within a single modelling framework. By integrating CAR models into mixture models, researchers can simultaneously account for spatial dependencies and identify distinct disease patterns within subpopulations.

Implementing a biology-empowered statistical framework to detect rare varient risk factors for complex diseases in whole genome sequence cohorts (PhD)

Supervisors: Vincent Macaulay, Luísa Pereira (Geneticist, i3s)
Relevant research groups: Statistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

The traditional genome-wide association studies to detect candidate genetic risk factors for complex diseases/phenotypes (GWAS) recur largely to the microarray technology, genotyping at once thousands or millions of variants regularly spaced across the genome. These microarrays include mostly common variants (minor allele frequency, MAF>5%), missing candidate rare variants which are the more likely to be deleterious [1]. Currently, the best strategy to genotype low-frequency (1%<MAF<5%) and rare (MAF<1%) variants is through next generation sequencing, and the increasingly availability of whole genome sequences (WGS) places us in the brink of detecting rare variants associated with complex diseases [2]. Statistically, this detection constitutes a challenge, as the massive number of rare variants in genomes (for example, 64.7M in 150 Iberian WGSs) would imply genotyping millions/billions of individuals to attain statistical power. In the last couple years, several statistical methods have being tested in the context of association of rare variants with complex traits [234], largely testing strategies to aggregate the rare variants. These works have not yet tested the statistical empowerment that can be gained by incorporating reliable biological evidence on the aggregation of rare variants in the most probable functional regions, such as non-coding regulatory regions that control the expression of genes [4]. In fact, it has been demonstrated that even for common candidate variants, most of these variants (around 88%; [5]) are located in non-coding regions. If this is true for the common variants detected by the traditional GWAS, it is highly probable to be also true for rare variants.

In this work, we will implement a biology-empowered statistical framework to detect rare variant risk factors for complex diseases in WGS cohorts. We will recur to the 200,000 WGSs from UK Biobank database [6], that will be available to scientists before the end of 2023. Access to clinical information of these >40 years old UK residents is also provided. We will build our framework around type-2 diabetes (T2D), a common complex disease for which thousands of common variant candidates have been found [7]. Also, the mapping of regulatory elements is well known for the pancreatic beta cells that play a leading role in T2D [8]. We will use this mapping in guiding the rare variants’ aggregation and test it against a random aggregation across the genome. Of course, the framework rationale will be appliable to any other complex disease. We will browse literature for aggregation methods available at the beginning of this work, but we already selected the method SKAT (sequence kernel association test; [3]) to be tested. SKAT fits a random-effects model to the set of variants within a genomic interval or biologically-meaningful region (such as a coding or regulatory region) and computes variant-set level p-values, while permitting correction for covariates (such as the principal components mentioned above that can account for population stratification between cases and controls).

 

Environmental, Ecological Sciences and Sustainability - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

There is also the possibility of applying to The Leverhulme Programme for Doctoral Training in Ecological Data Science which is hosted in our school. Information on how to apply can be found on the programme's application page.

Collective animal movement and resource selection in changing environments (PhD)

Supervisors: Mu Nui, Paul Blackwell (Sheffield), Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Advances in technologies such as GPS have revolutionized the tracking of wildlife, providing detailed data on how animals move and interact. It is now feasible to track multiple animals simultaneously, at high frequency and for long periods. This project explores the movement of animals both individually and in groups, focusing on how environmental factors and resource availability shape their behaviours. Animals in groups often move interdependently, influenced by interactions between individuals. However, traditional movement models primarily address individual animals and ignore these group dynamics. On the other hand, collective movement models are often parameterized with short-term data. This research will develop innovative statistical models to better understand how animals move collectively. Such movement necessarily involves each individual responding to their physical environment, as well as other group members, and a key aspect of this project is understanding how animals use space and resources within the group setting. Incorporating both these aspects of short-term movement decisions and long-term space use in a coherent mathematical model will illuminate how animals collectively adapt to their surroundings. It will use cutting-edge statistical and machine learning methods, such as diffusion models. The findings and methodology developed will provide valuable insights into animal behaviour and ecology, supporting conservation efforts and helping manage human impacts on wildlife.

Extreme value theory for predicting animal dispersal and movement in a changing climate (PhD)

Supervisors: Jafet Belmont Osuna, Daniela Castro-Camilo, Juan Morales (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

There is an imperative need to understand and predict how populations respond to multiple aspects of global change, such as habitat fragmentation and climate change. Extreme weather events, which are expected to increase in both frequency and intensity, can profoundly impact animal movement and spatial dynamics. Additionally, for many species, rare long-distance dispersal events play a crucial role in reaching suitable habitats for germination, establishment, and colonisation across fragmented or managed landscapes. Many plant species, for instance, rely on birds for dispersal—birds ingest fruits and later deposit seeds through defecation or regurgitation. Accurately predicting such processes requires models that capture both seed retention times within birds and bird movement patterns. This project aims to develop and apply cutting-edge statistical methods for analysing animal movement and dispersal data using Extreme-Value Theory (EVT) within a Bayesian framework. EVT, a well-established theoretical framework that has been widely used in environmental sciences for modelling extreme events, has seen limited application in ecology. We will leverage EVT to (1) understand how extreme weather events can affect animal movement, and (2) to make better predictions of dispersal processes. This work offers substantial potential for novel insights and methodological advancements. By integrating experimental research and state-of-the-art tracking technologies, the project will inform the development of hierarchical Bayesian models to explore patterns and drivers of animal movement and dispersal, with a particular focus on extreme behaviours and their ecological implications.

Leveraging large language models to provide insights into global plant biodiversity (PhD)

Supervisors: Richard Reeve (MVLS, UoG), Jake Lever (CS, UoG), Vinny Davies, Neil Brummitt (NHM), Ana Claudia Araujo (NHM)Ben Scott (NHM)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

Plants are fundamental to the provision of ecosystem services, and we are wholly dependent on them for survival. Yet, globally, many plant species are under threat of extinction. We need a comprehensive plant trait dataset as input to the next generation of biodiversity-climate models. The lack of such a dataset means that existing approaches focus on limited "Plant Functional Types" and cannot estimate the impacts of climate and land use change on individual species or help inform decision making on mitigating biodiversity loss. The needed plant trait data, from niche preferences to growth rates, are locked in the text of the vast botanical literature of the Biodiversity Heritage Library and other texts available to the Natural History Museum. This studentship would use the recent advances in large language models (LLMs) and natural language processing (NLP) to extract this information. We have developed an ecosystem modelling tool (EcoSISTEM, Harris et al., 2023, https://github.com/EcoJulia/EcoSISTEM.jl) that captures survival, competition and reproduction among multiple plant species across a landscape. LLMs will enable extraction of traits data for integration into the EcoSISTEM infrastructure and enable the inclusion of multilingual records, expanding the system's geographic and historical range. By addressing these enormous data gaps, the student will then explore global spatial and temporal variability in functional and other trait-based diversity measures to produce a unique and comprehensive evaluation of whether predictors exist of diversity at a global scale. Ultimately, the project will boost EcoSISTEM's ability to simulate plant responses to climate change with greater accuracy.

The impact of deep learning optimization and design choices for marine biodiversity monitoring (PhD)

Supervisors: Tiffany VlaarLaurence De Clippele (MVLS, UoG)
Relevant research groups: Modelling in Space and Time, Machine Learning and AIEnvironmental, Ecological Sciences and Sustainability
Funding: 
This project is competitively funded through The Leverhulme Programme for Doctoral Training in Ecological Data Science.

This project aims to increase the efficiency, accuracy, and reliability of annotation and classification of large marine datasets using deep learning. Timely and accurate analysis of these long-term datasets will aid marine biodiversity monitoring efforts. Design of more efficient strategies further aims to reduce the carbon footprint of training and fine-tuning large machine learning models. The project is expected to lead to various novel insights for the machine learning community such as on optimal pre-training choices for downstream robust performance, the optimal order of learning samples with varying complexity levels, navigating instances with label ground truth uncertainty, and re-evaluation of metric design. The PhD student will be supported in building international collaborations with researchers across different disciplines and in developing effective research communication skills.

Sampling strategies for environmental monitoring networks (PhD)

Supervisors: Claire MillerCraig AlexanderCraig Wilkie
Relevant research groups: Modelling in Space and TimeEnvironmental, Ecological, and Sustainability
Funding: This project has specific funding available. More information can be found at the FindAPhD Advert.

In recent years, there has been a lot of work done on investigating how to monitor environmental variables in the most efficient way.  Environmental variables, such as pollutants in water, can be monitored through, for example, in-situ sampling, automatic in-situ sensors or remote sensing.  However, each sampling approach has different levels of accuracy and is available at different spatial and temporal resolutions. 

Environmental regulators and industry all have a responsibility and commitment to monitoring environmental standards and mitigating the potential for increases to levels of pollutants.  At a time of world-wide budgetary pressures, the most efficient monitoring schemes are required.  However, the mechanisms of monitoring can also be detrimental to the environment e.g. through more visits to a site or lab/computer processing creating a higher environmental footprint.

The aim of the PhD is to extend work already carried out on the optimal design of monitoring networks for spatiotemporal models. Specifically, to identify the spatiotemporal sampling designs that can balance budgetary requirements and environmental impact, with a view to developing and enhancing online tools (e.g. GWSDAT) to provide automatic guidance to practitioners. The latter can then integrate this guidance into their assessment and development of the most optimal monitoring network. This will require statistical methodological development, working on computationally efficient implementations and software development.

The PhD will be jointly supervised by partners from industry and hence the successful candidate will additionally engage in knowledge exchange/transfer, training and networking in this sector.

New methods for analysis of migratory navigation (PhD)

Supervisors: Janine IllianUrška Demšar (University of St Andrews)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Migratory birds travel annually across vast expanses of oceans and continents to reach their destination with incredible accuracy. How they are able to do this using only locally available cues is still not fully understood. Migratory navigation consists of two processes: birds either identify the direction in which to fly (compass orientation) or the location where they are at a specific moment in time (geographic positioning). One of the possible ways they do this is to use information from the Earth’s magnetic field in the so-called geomagnetic navigation (Mouritsen, 2018). While there is substantial evidence (both physiological and behavioural) that they do sense magnetic field (Deutschlander and Beason, 2014), we however still do not know exactly which of the components of the field they use for orientation or positioning. We also do not understand how rapid changes in the field affect movement behaviour.

There is a possibility that birds can sense these rapid large changes and that this may affect their navigational process. To study this, we need to link accurate data on Earth’s magnetic field with animal tracking data. This has only become possible very recently through new spatial data science advances:  we developed the MagGeo tool, which links contemporaneous geomagnetic data from Swarm satellites of the European Space Agency with animal tracking data (Benitez Paez et al., 2021).

Linking geomagnetic data to animal tracking data however creates a highly-dimensional data set, which is difficult to explore. Typical analyses of contextual environmental information in ecology include representing contextual variables as co-variates in relatively simple statistical models (Brum Bastos et al., 2021), but this is not sufficient for studying detailed navigational behaviour. This project will analyse complex spatio-temporal data using computationally efficient statistical model fitting approches in a Bayesian context.

This project is fully based on open data to support reproducibility and open science. We will test our new methods by annotating publicly available bird tracking data (e.g. from repositories such as Movebank.org), using the open MagGeo tool and implementing our new methods as Free and Open Source Software (R/Python).

Integrated spatio-temporal modelling for environmental data (PhD)

Supervisors: Janine IllianPeter Henrys (UKCEH)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

The last decade has seen a proliferation of environmental data with vast quantities of information available from various sources. This has been due to a number of different factors including: the advent of sensor technologies; the provision of remotely sensed data from both drones and satellites; and the explosion in citizen science initiatives. These data represent a step change in the resolution of available data across space and time - sensors can be streaming data at a resolution of seconds whereas citizen science observations can be in the hundreds of thousands.  

Over the same period, the resources available for traditional field surveys have decreased dramatically whilst logistical issues (such as access to sites, ) have increased. This has severely impacted the ability for field survey campaigns to collect data at high spatial and temporal resolutions. It is exactly this sort of information that is required to fit models that can quantify and predict the spread of invasive species, for example. 

Whilst we have seen an explosion of data across various sources, there is no single source that provides both the spatial and temporal intensity that may be required when fitting complex spatio-temporal models (cf invasive species example) - each has its own advantages and benefits in terms of information content. There is therefore potentially huge benefit in beginning together data from these different sources within a consistent framework to exploit the benefits each offers and to understand processes at unprecedented resolutions/scales that would be impossible to monitor. 

Current approaches to combining data in this way are typically very bespoke and involve complex model structures that are not reusable outside of the particular application area. What is needed is an overarching generic methodological framework and associated software solutions to implement such analyses. Not only would such a framework provide the methodological basis to enable researchers to benefit from this big data revolution, but also the capability to change such analyses from being stand alone research projects in their own right, to more operational, standard analytical routines. 

FInally, such dynamic, integrated analyses could feedback into data collection initiatives to ensure optimal allocation of effort for traditional surveys or optimal power management for sensor networks. The major step change being that this optimal allocation of effort is conditional on other data that is available. So, for example, given the coverage and intensity of the citizen science data, where should we optimally send our paid surveyors? The idea is that information is collected at times and locations that provide the greatest benefit in understanding the underpinning stochastic processes. These two major issues - integrated analyses and adaptive sampling - ensure that environmental monitoring is fit for purpose and scientists, policy and industry can benefit from the big data revolution. 

This project will develop an integrated statistical modelling strategy that provides a single modelling framework for enabling quantification of ecosystem goods and services while accounting for the fundamental differences in different data streams. Data collected at different spatial resolutions can be used within the same model through projecting it into continuous space and projecting it back into the landscape level of interest.  As a result, decisions can be made at the relevant spatial scale and uncertainty is propagated through, facilitating appropriate decision making. 

Statistical methodology for assessing the impacts of offshore renewable developments on marine wildlife (PhD)

Supervisors: Janine IllianEsther Jones (BIOSS)Adam Butler (BIOSS)
Relevant research groups: Modelling in Space and TimeBayesian Modelling and InferenceComputational StatisticsEnvironmental, Ecological Sciences and Sustainability

Assessing the impacts of offshore renewable developments on marine wildlife is a critical component of the consenting process. A NERC-funded project, ECOWINGS, will provide a step-change in analysing predator-prey dynamics in the marine environment, collecting data across trophic levels against a backdrop of developing wind farms and climate change. Aerial survey and GPS data from multiple species of seabirds will be collected contemporaneously alongside prey data available over the whole water column from an automated surface vehicle and underwater drone.

These methods of data collection will generate 3D space and time profiles of predators and prey, creating a rich source of information and enormous potential for modelling and interrogation. The data present a unique opportunity for experimental design across a dynamic and changing marine ecosystem, which is heavily influenced by local and global anthropogenic activities. However, these data have complex intrinsic spatio-temporal properties, which are challenging to analyse. Significant statistical methods development could be achieved using this system as a case study, contributing to the scientific knowledge base not only in offshore renewables but more generally in the many circumstances where patchy ecological spatio-temporal data are available. 

This PhD project will develop spatio-temporal modelling methodology that will allow user to anaylse these exciting - and complex - data sets and help inform our knowledge on the impact of off-shore renewable on wildlife.

Detecting hotspots of water pollution in complex constrained domains and networks (PhD)

Supervisors: Mu Niu, Craig Wilkie, Cathy Yi-Hsuan Chen (Business School, UofG)Michael Tso (Lancaster University)
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

Technological developments with smart sensors are changing the way that the environment is monitored.  Many such smart systems are under development, with small, energy efficient, mobile sensors being trialled.  Such systems offer opportunities to change how we monitor the environment, but this requires additional statistical development in the optimisation of the location of the sensors.

The aim of this project is to develop a mathematical and computational inferential framework to identify optimal sensor deployment locations within complex, constrained domains and networks for improved water contamination detection. Methods for estimating covariance functions in such domains rely on computationally intensive diffusion process simulations, limiting their application to relatively simple domains and small-scale datasets. To address this challenge, the project will employ accelerated computing paradigms with highly parallelized GPUs to enhance simulation efficiency. The framework will also address regression, classification, and optimization problems on latent manifolds embedded in high-dimensional spaces, such as image clouds (e.g., remote sensing satellite images), which are crucial for sensor deployment and performance evaluation. As the project progresses, particularly in the image cloud case, the computational demands will intensify, requiring advanced GPU resources or exascale computing to ensure scalability, efficiency, and performance.

Downscaling and Prediction of Rainfall Extremes from Climate Model Outputs (PhD)

Supervisors: Sebastian Gerhard Mutz (GES, UoG), Daniela Castro-Camilo
Relevant research groups: Modelling in Space and Time, Bayesian Modelling and Inference, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

In the last decade, Scotland’s rainfall increased by 9% annually and 19% in winter, with more water from extreme events, posing risks to the environment, infrastructure, health, and industry. Urgent issues such as flooding, mass wasting, and water quality are closely tied to rainfall extremes. Reliable predictions of extremes are, therefore, critical for risk management. Prediction of extremes, which is one of the main focuses of extreme value theory, is still considered one of the grand challenges by the World Climate Research Programme. This project will address this challenge by developing novel statistical, computationally efficient models that are able to predict rainfall extremes from the output of GPU-optimised climate models.

Exploring Hybrid Flood modelling leveraging GPU/Exascale computing (PhD)

Supervisors: Andrew Elliott, Lindsay Beevers (University of Edinburgh), Claire MillerMichele Weiland (University of Edinburgh)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Flood modelling is crucial for understanding flood hazards, now and in the future as a result of climate change. Modelling provides inundation extents (or flood footprints) which provide outlines of areas at risk which can help to manage our increasingly complex infrastructure network as our climate changes. Our ability to make fast, accurate predictions of fluvial inundation extents is important for disaster risk reduction. Simultaneously capturing uncertainty in forecasts or predictions is essential for efficient planning and design. Both aims require methods which are computationally efficient whilst maintaining accurate predictions. Current Navier-stokes physics-based models are computationally intensive; thus this project would explore approaches to hybrid flood models which utilise GPU-compute and ML fused with physics-based models, as well as investigating scaling the numerical models to large-scale HPC resources.

Scalable approaches to mathematical modelling and uncertainty quantification in heterogeneous peatlands (PhD)

Supervisors: Raimondo Penta, Vinny Davies, Jessica Davies (Lancaster University), Lawrence BullMatteo Icardi (University of Nottingham)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification, Continuum Mechanics
Funding: This project is competitively funded through the ExaGEO DLA

While only covering 3% of the Earth’s surface, peatlands store >30% of terrestrial carbon and play a vital ecological role. Peatlands are, however, highly sensitive to climate change and human pressures, and therefore understanding and restoring them is crucial for climate action. Multiscale mathematical models can represent the complex microstructures and interactions that control peatland dynamics but are limited by their computational demands. GPU and Exascale computing advances offer a timely opportunity to unlock the potential benefits of mathematically-led peatland modelling approaches. By scaling these complex models to run on new architectures or by directly incorporating mathematical constraints into GPU-based deep learning approaches, scalable computing will to deliver transformative insights into peatland dynamics and their restoration, supporting global climate efforts.

Scalable Inference and Uncertainty Quantification for Ecosystem Modelling (PhD)

Supervisors: Vinny Davies, Richard Reeve (BOHVM, UoG), David Johnson (Lancaster University), Christina CobboldNeil Brummitt (Natural History Museum)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Understanding the stability of ecosystems and how they are impacted by climate and land use change can allow us to identify sites where biodiversity loss will occur and help to direct policymakers in mitigation efforts. Our current digital twin of plant biodiversity – https://github.com/EcoJulia/EcoSISTEM.jl – provides functionality for simulating species through processes of competition, reproduction, dispersal and death, as well as environmental changes in climate and habitat, but it would benefit from enhancement in several areas. The three this project would most likely target are the introduction of a soil layer (and the improvement of the modelling of soil water); improving the efficiency of the code to handle a more complex model and to allow stochastic and systematic Uncertainty Quantification (UQ); and developing techniques for scalable inference of missing parameters.

Smart-sensing for systems-level water quality monitoring (PhD)

Supervisors: Craig Wilkie, Lawrence Bull, Claire MillerStephen Thackeray (Lancaster University)
Relevant research groups: Machine Learning and AI, Emulation and Uncertainty Quantification, Environmental, Ecological Sciences & Sustainability
Funding: This project is competitively funded through the ExaGEO DLA.

Freshwater systems are vital for sustaining the environment, agriculture, and urban development, yet in the UK, only 33% of rivers and canals meet ‘good ecological status’ (JNCC, 2024). Water monitoring is essential to mitigate the damage caused by pollutants (from agriculture, urban settlements, or waste treatment) and while sensors are increasingly affordable, coverage remains a significant issue. New techniques for edge processing and remote power offer one solution, providing alternative sources of telemetry data. However, methods which combine such information into systems-level sensing for water are not as mature as other applications (e.g., built environment). In response, procedures for computation at the edge, decision-making, and data/model interoperability are considerations of this project.

Statistical Emulation Development for Landscape Evolution Models (PhD)

Supervisors: Benn Macdonald, Mu Niu, Paul Eizenhöfer (GES, UoG), Eky Febrianto (Engineering, UoG)
Relevant research groups: Modelling in Space and Time, Environmental, Ecological Sciences & Sustainability, Machine Learning and AI, Emulation and Uncertainty Quantification
Funding: This project is competitively funded through the ExaGEO DLA.

Many real-world processes, including those governing landscape evolution, can be effectively mathematically described via differential equations. These equations describe how processes, e.g. the physiography of mountainous landscapes, change with respect to other variables, e.g. time and space. Conventional approaches for performing statistical inference involve repeated numerical solving of the equations. Every time parameters of the equations are changed in a statistical optimisation or sampling procedure; the equations need to be re-solved numerically. The associated large computational cost limits advancements when scaling to more complex systems, the application of statistical inference and machine learning approaches, as well as the implementation of more holistic approaches to Earth System science. This yields to the need for an accelerated computing paradigm involving highly parallelised GPUs for the evaluation of the forward problem.

Beyond advanced computing hardware, emulation is becoming a more popular way to tackle this issue. The idea is that first the differential equations are solved as many times as possible and then the output is interpolated using statistical techniques. Then, when inference is carried out, the emulator predictions replace the differential equation solutions. Since prediction from an emulator is very fast, this avoids the computational bottleneck. If the emulator is a good representation of the differential equation output, then parameter inference can be accurate.

The student will begin by working on parallelising the numerical solver of the mathematical model via GPUs. This means that many more solutions can be generated on which to build the emulator, in a timeframe that is feasible. Then, they will develop efficient emulators for complex landscape evolution models, as the PhD project evolves.

Imaging, Image Processing and Image Analysis - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Medical image segmentation and uncertainty quantification (PhD)

Supervisors: Surajit Ray
Relevant research groups: Machine Learning and AIImaging, Image Processing and Image Analysis

This project focuses on the application of medical imaging and uncertainty quantification for the detection of tumours. The project aims to provide clinicians with accurate, non-invasive methods for detecting and classifying the presence of malignant and benign tumours. It seeks to combine advanced medical imaging technologies such as ultrasound, computed tomography (CT) and magnetic resonance imaging (MRI) with the latest artificial intelligence algorithms. These methods will automate the detection process and may be used for determining malignancy with a high degree of accuracy. Uncertainty quantification (UQ) techniques will help generate a more precise prediction for tumour malignancy by providing a characterisation of the degree of uncertainty associated with the diagnosis. The combination of medical imaging and UQ will significantly decrease the requirement for performing invasive medical procedures such as biopsies. This will improve the accuracy of the tumour detection process and reduce the duration of diagnosis. The project will also benefit from the development of novel image processing algorithms (e.g. deep learning) and machine learning models. These algorithms and models will help improve the accuracy of the tumour detection process and assist clinicians in making the best treatment decisions.

Analysis of spatially correlated functional data objects (PhD)

Supervisors: Surajit Ray
Relevant research groups: Modelling in Space and TimeComputational StatisticsNonparametric and Semi-parametric StatisticsImaging, Image Processing and Image Analysis

Historically, functional data analysis techniques have widely been used to analyze traditional time series data, albeit from a different perspective. Of late, FDA techniques are increasingly being used in domains such as environmental science, where the data are spatio-temporal in nature and hence is it typical to consider such data as functional data where the functions are correlated in time or space. An example where modeling the dependencies is crucial is in analyzing remotely sensed data observed over a number of years across the surface of the earth, where each year forms a single functional data object. One might be interested in decomposing the overall variation across space and time and attribute it to covariates of interest. Another interesting class of data with dependence structure consists of weather data on several variables collected from balloons where the domain of the functions is a vertical strip in the atmosphere, and the data are spatially correlated. One of the challenges in such type of data is the problem of missingness, to address which one needs develop appropriate spatial smoothing techniques for spatially dependent functional data. There are also interesting design of experiment issues, as well as questions of data calibration to account for the variability in sensing instruments. Inspite of the research initiative in analyzing dependent functional data there are several unresolved problems, which the student will work on:

  • robust statistical models for incorporating temporal and spatial dependencies in functional data
  • developing reliable prediction and interpolation techniques for dependent functional data
  • developing inferential framework for testing hypotheses related to simplified dependent structures
  • analysing sparsely observed functional data by borrowing information from neighbours
  • visualisation of data summaries associated with dependent functional data
  • Clustering of functional data

Generating deep fake left ventricles: a step towards personalised heart treatments (PhD)

Supervisors: Andrew Elliott, Vinny Davies, Hao Gao
Relevant research groups: Machine Learning and AIEmulation and Uncertainty QuantificationBiostatistics, Epidemiology and Health ApplicationsImaging, Image Processing and Image Analysis

Personalised medicine is an exciting avenue in the field of cardiac healthcare where an understanding of patient-specific mechanisms can lead to improved treatments (Gao et al., 2017). The use of mathematical models to link the underlying properties of the heart with cardiac imaging offers the possibility of obtaining important parameters of heart function non-invasively (Gao et al., 2015). Unfortunately, current estimation methods rely on complex mathematical forward simulations, resulting in a solution taking hours, a time frame not suitable for real-time treatment decisions. To increase the applicability of these methods, statistical emulation methods have been proposed as an efficient way of estimating the parameters (Davies et al., 2019Noè et al., 2019). In this approach, simulations of the mathematical model are run in advance and then machine learning based methods are used to estimate the relationship between the cardiac imaging and the parameters of interest. These methods are, however, limited by our ability to understand the how cardiac geometry varies across patients which is in term limited by the amount of data available (Romaszko et al., 2019). In this project we will look at AI based methods for generating fake cardiac geometries which can be used to increase the amount of data (Qiao et al., 2023). We will explore different types of AI generation, including Generative Adversarial Networks or Variational Autoencoders, to understand how we can generate better 3D and 4D models of the fake left ventricles and create an improved emulation strategy that can make use of them.

Statistics in Chemistry/Physics - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Projects will appear below here when they become available.

Statistical Modelling for Biology, Genetics and *omics - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Modelling genetic variation (MSc/PhD)

Supervisors: Vincent Macaulay
Relevant research groups: Bayesian Modelling and InferenceStatistical Modelling for Biology, Genetics and *omics

Variation in the distribution of different DNA sequences across individuals has been shaped by many processes which can be modelled probabilistically, processes such as demographic factors like prehistoric population movements, or natural selection. This project involves developing new techniques for teasing out 

Modality of mixtures of distributions (PhD)

Supervisors: Surajit Ray
Relevant research groups: Nonparametric and Semi-parametric StatisticsApplied Probability and Stochastic ProcessesStatistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modeling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures. In this project, we wish to explore the modality of non-normal distributions and their application to real life problems

Implementing a biology-empowered statistical framework to detect rare varient risk factors for complex diseases in whole genome sequence cohorts (PhD)

Supervisors: Vincent Macaulay, Luísa Pereira (Geneticist, i3s)
Relevant research groups: Statistical Modelling for Biology, Genetics and *omicsBiostatistics, Epidemiology and Health Applications

The traditional genome-wide association studies to detect candidate genetic risk factors for complex diseases/phenotypes (GWAS) recur largely to the microarray technology, genotyping at once thousands or millions of variants regularly spaced across the genome. These microarrays include mostly common variants (minor allele frequency, MAF>5%), missing candidate rare variants which are the more likely to be deleterious [1]. Currently, the best strategy to genotype low-frequency (1%<MAF<5%) and rare (MAF<1%) variants is through next generation sequencing, and the increasingly availability of whole genome sequences (WGS) places us in the brink of detecting rare variants associated with complex diseases [2]. Statistically, this detection constitutes a challenge, as the massive number of rare variants in genomes (for example, 64.7M in 150 Iberian WGSs) would imply genotyping millions/billions of individuals to attain statistical power. In the last couple years, several statistical methods have being tested in the context of association of rare variants with complex traits [2, 3, 4], largely testing strategies to aggregate the rare variants. These works have not yet tested the statistical empowerment that can be gained by incorporating reliable biological evidence on the aggregation of rare variants in the most probable functional regions, such as non-coding regulatory regions that control the expression of genes [4]. In fact, it has been demonstrated that even for common candidate variants, most of these variants (around 88%; [5]) are located in non-coding regions. If this is true for the common variants detected by the traditional GWAS, it is highly probable to be also true for rare variants.

In this work, we will implement a biology-empowered statistical framework to detect rare variant risk factors for complex diseases in WGS cohorts. We will recur to the 200,000 WGSs from UK Biobank database [6], that will be available to scientists before the end of 2023. Access to clinical information of these >40 years old UK residents is also provided. We will build our framework around type-2 diabetes (T2D), a common complex disease for which thousands of common variant candidates have been found [7]. Also, the mapping of regulatory elements is well known for the pancreatic beta cells that play a leading role in T2D [8]. We will use this mapping in guiding the rare variants’ aggregation and test it against a random aggregation across the genome. Of course, the framework rationale will be appliable to any other complex disease. We will browse literature for aggregation methods available at the beginning of this work, but we already selected the method SKAT (sequence kernel association test; [3]) to be tested. SKAT fits a random-effects model to the set of variants within a genomic interval or biologically-meaningful region (such as a coding or regulatory region) and computes variant-set level p-values, while permitting correction for covariates (such as the principal components mentioned above that can account for population stratification between cases and controls).

Generating deep fake left ventricles: a step towards personalised heart treatments (PhD)

Supervisors: Andrew ElliottVinny DaviesHao Gao
Relevant research groups: Machine Learning and AIEmulation and Uncertainty QuantificationBiostatistics, Epidemiology and Health ApplicationsImaging, Image Processing and Image Analysis

Personalised medicine is an exciting avenue in the field of cardiac healthcare where an understanding of patient-specific mechanisms can lead to improved treatments (Gao et al., 2017). The use of mathematical models to link the underlying properties of the heart with cardiac imaging offers the possibility of obtaining important parameters of heart function non-invasively (Gao et al., 2015). Unfortunately, current estimation methods rely on complex mathematical forward simulations, resulting in a solution taking hours, a time frame not suitable for real-time treatment decisions. To increase the applicability of these methods, statistical emulation methods have been proposed as an efficient way of estimating the parameters (Davies et al., 2019Noè et al., 2019). In this approach, simulations of the mathematical model are run in advance and then machine learning based methods are used to estimate the relationship between the cardiac imaging and the parameters of interest. These methods are, however, limited by our ability to understand the how cardiac geometry varies across patients which is in term limited by the amount of data available (Romaszko et al., 2019). In this project we will look at AI based methods for generating fake cardiac geometries which can be used to increase the amount of data (Qiao et al., 2023). We will explore different types of AI generation, including Generative Adversarial Networks or Variational Autoencoders, to understand how we can generate better 3D and 4D models of the fake left ventricles and create an improved emulation strategy that can make use of them.

Social and Urban Studies - Example Research Projects

Information about postgraduate research opportunities and how to apply can be found on the Postgraduate Research Study page. Below is a selection of projects that could be undertaken with our group.

Causal inference in noisy social networks (PhD)

Supervisors: Vanessa McNealis
Relevant research groups: Nonparametric and Semi-parametric StatisticsBiostatistics, Epidemiology and Health ApplicationsSocial and Urban Studies

One core task of science is causal inference, yet distinguishing causality from spurious associations in observational data can be challenging. Statistical causal inference provides a framework to define causal effects, specify assumptions for identifying causal effects, and assess sensitivity of causal estimators to these assumptions.

Recent interest has focused on causal inference under interference (or spillover), where one individual’s treatment affects the outcomes of others. Social network data are particularly valuable for this purpose, as they offer information about connections between individuals, revealing potential pathways for interference. For instance, in the National Longitudinal Study of Adolescent Health (Add Health), peer influences among adolescents provide an ideal case for studying spillover, especially as they relate to behavioural and academic outcomes. However, one challenge for Add Health is the very high level of missing edge variable data and censoring present, posing challenges since many methods for evaluating spillover effects assume fully observed networks.

This PhD will develop statistical methods for causal inference under network interference with noise, considering the following issues/approaches:

  • Bias characterization in the presence of missing or uncertain edge information
  • Semi-parametric inference
  • Propensity score methods
  • Multiple imputation for network data

A good knowledge of methods for survey sampling and regression is essential, familiarity with causal inference, statistical methods for coarse data, and semi-parametric inference would be an advantage.

Innovation in Learning and Teaching

Statistics and Data Analytics Education - Example Research Projects

Our group has an active PhD student community, and every year we admit new PhD students. We welcome applications from across the world. Further information can be found here.