School of Computing Science

The Glasgow Information Retrieval Group within the School of Computing Science at the University of Glasgow was founded 32 years ago in 1986 by Professor C. J. ‘Keith’ van Rijsbergen, often considered one of the founders of modern Information Retrieval (IR). From its outset, the Glasgow IR group has focused on improving the effectiveness of IR systems, inventing new logic & probabilistic retrieval models in the 90's and early 2000's, followed by the development of adaptive query expansion techniques, interactive multimedia models, the Divergence From Randomness framework, as well as leading research into quantum, expertise search and search result diversification models in the late 2000's. Since then, the Information Retrieval group embraced emerging machine learning and deep learning technologies for very large corpora and data streams, and have been at the forefront of research, development and application of those technologies for search and recommendation use-cases in a manner that ensures both effectiveness and efficiency.

The Glasgow IR Group has a strong research track record. Indeed, the ACM Digital Library shows that the group is ranked first by number of papers (429) at the SIGIR conference (the top CORE A* conference in the IR field). Meanwhile, a recent study by Microsoft Research of the 40 years of SIGIR showed the University of Glasgow as the 5th most cited university at the conference and the 1st in Europe. The group is also renowned for developing the popular open source IR platform, Terrier.org, which has been downloaded over 60,000 since its first release in 2004 and is cited by over 3500 research papers. Furthermore, the group has a long history of engagement with the public and industry sectors from small SMEs to multinational corporations.

The Informer magazine of BCS's Information Retrieval Specialist Group carried a recent profile on the Glasgow Information Retrieval Group.

Topics

As the most active Information Retrieval group by publications in Europe and one of the longest running, our research covers the full-spectrum of topics that are relevant to the development of IR systems:

IR & Recommender Systems Models

  • Theoretical modelling of IR systems
  • Machine learning and deep learning for information retrieval and recommender systems
  • Interactive information retrieval (personalised IR, emotion based search, user modelling for IR, gestural IR)
  • User modelling and personal information access
  • Topic modeling; Entity search; Natural language processing for IR
  • Recommender systems; Context-aware venue suggestion

Large-scale IR & Efficient IR

  • Web information retrieval; Big data and information retrieval
  • Efficient architecture for large-scale IR systems; Data stream processing architectures

Data Streams & IR

  • Real-time information retrieval
  • Search in social and sensor networks

Artificial Intelligence & IR

  • Conversational information seeking and dialogue systems
  • Information credibility, transparency, explainability and verification in IR systems
  • Fairness in information retrieval & recommender systems

Natural Language Processing & IR

  • Information extraction including entity and relation extraction
  • Automatic knowledge graph construction
  • Multi-task models, joint models and summarization

Applications

  • Multimedia information retrieval
  • Domain-specific information retrieval: smart cities; health; news; eDiscovery; sensitivity review
  • Emergency management and crisis informatics
  • Politics and Media

Evaluation

  • Test collections and evaluation metrics
  • Evaluation of IR systems and crowdsourcing for IR
  • Online and Offline Evaluation of IR and Recommender Systems
  • Eye-tracking and physiological approaches, such as fMRI

Projects

Projects:

IR page banner for Climinvest

Current staff and students

Academic Staff:

 

Current Research Assistants and Research Students:

  • Anna Rezk
  • Javier Sanz-Cruzado Puig
  • Zixuan Yi
  • Edward Richards
  • Zeyan Liang
  • Susmita Das
  • Andreas Chari
  • Jack McKechnie
  • Sharare Zolghadr
  • Andrew Parry
  • Xuejun Chang
  • Jinyuan Fang
  • Zhili Shen
  • Xinhao Yi
  • Zeyuan Meng
  • Kangheng Liang
  • Mahdi Dehghan
  • James Nurdin
  • Ritajit Dey
  • Fangzheng Tian
  • Lubingzhi Guo
  • Shen Dong
  • Zhaohan Meng
  • Amparo Gimenez Rios
  • Xi Zhang
  • Hamish Clark

Recent Graduates

  • Aleksandr V. Petrov, Tripadvisor, Senior Machine Learning Scientist
  • Iain Mackie, Malted.ai, Founder & CEO
  • Carlos Gemmel, Malted.ai, Founder & CTO
  • Federico Rossetto, Malted.ai, Founder & Chief of Engineering
  • Jijun Long, Hunan University, Lecturer
  • Hitarth Narvala, Kotak Mahindra Bank, Data Scientist
  • Alexander Hepburn, Research Associate, University of Padua
  • Jarana Manotumruksa, University College London, Researcher
  • Anjie Fang, Amazon, Applied Scientist
  • Jorge David Gonzalez Paule, Jobandtalent Espana, Data Scientist
  • Colin Wilkie, Siemens, Data Engineer
  • David Maxwell, University of Deft, Data Engineer
  • Graham McDonald, University of Glasgow, Senior Lecturer
  • James McMinn, ScoopAnalytics, Co-Founder
  • Stuart Mackie, BiP Solutions/Strathclyde Uni, Data Scientist
  • Horatiu Bota, Prodsight, Data Scientist
  • Jesus Alberto Rodriquez Perez, University of Glasgow, Postdoctoral Researcher
  • Fajie Yuan, Tencent, Senior Researcher

 

Notable Alumni

  • Ryen White (Research Manager, Microsoft Research AI)
  • Mark Sanderson (Professor, Royal Melbourne Institute of Technology)
  • Mounia Lalmas (Head of Tech Research, Spotify)
  • Ian Ruthven (Professor, Strathclyde University)
  • Fabio Crestani (Professor, University of Lugano)
  • Vassilis Plachouras (Software Engineering, Facebook)
  • Leif Azzopardi (Chancellor's Fellow, Strathclyde University)
  • Rodrygo Santos (Assistant Professor, Federal University of Minas Gerais)
  • Eugene Kharitonov (Research Engineer, Facebook)
  • Saul Vargas (Senior Machine Learning Scientist, ASOS)
  • Dyaa Albakour (Lead Data Scientist, Signal Media)
  • Nut Limsopatham (Senior Researcher, Microsoft AI)
  • Amir Jadidinejad (AI Engineer, Glaxo Smith Kline)
  • Zaiqiao Meng (Researcher, Cambridge University)

Terrier IR platform

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents developed by the IR group. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Indeed, Terrier is used internationally, with over 60,000 downloads since its first release in 2004. Terrier is is used widely by the research community, with over 3700 citations in research papers according to Google Scholar.

Terrier comes in two versions, the core Java-based search engine is available at http://terrier.org, however most users work with PyTerrier - a modular platform for Python that integrates a wide range of state-of-the-art AI search technologies. The documentation for PyTerrier is here, while you can download the platform from GitHub.

Popular resources

For those new to the Information Retrieval field, the group maintains a useful set of common resources for researchers and practitioners:

  • Information Retrieval Test Collections: On this page are a list of publically available IR test collections. Some are held locally and some are pointers to remote sites.
  • Collections of text and corpora: What's the difference between a test collection and a text collection? Well a test collection has to have associated queries and relevance judgements. The things in here are simply document collections.
  • Language reference works: This page contains links to online language reference works, such as dictionaries, thesauri etc.
  • IR systems: A list of links to some sites that have information about IR systems.
  • Linguistic utilities: Bits of IR language related utilities like stemmers, stop words lists, morphological taggers, etc.
  • IR Journals: Various table of contents and abstracts of the papers in a number of well known IR journals.
  • IR Organisations: Various IR groups and more formal organisations.
  • Books: Supplements of books or whole books online.

Upcoming events

"Fake Science" and Information Overload in Academia – Challenges and Opportunities for Information Retrieval Research

Group: Information Retrieval (IR)
Speaker: Ingo Frommholz and Libo Ren, Modul University Vienna
Date: 16 February, 2026
Time: 15:00 - 16:00
Location: Sir Alwyn Williams Building, 422 Seminar Room

Title:
"Fake Science" and Information Overload in Academia – Challenges and Opportunities for Information Retrieval Research

Abstract:
Scientific communication is experiencing unprecedented growth, with publication volumes increasing at a scale that overwhelms researchers’ capacity to process and evaluate information. This information overload is not only a byproduct of legitimate scholarly activity but is increasingly driven by low-quality and even fraudulent content. Alongside rigorous, well-designed studies, the scholarly record is also populated by weak methodologies, poorly vetted results, and intentional manipulation. The rise of AI-accelerated publishing, paper mills, tortured phrases, and other forms of “fake science” intensifies this problem, creating massive noise and undermining the reliability of academic information systems.

For the Information Retrieval (IR) community, this poses both critical challenges and unique opportunities. On the one hand, information overload and quality degradation pose a challenge that needs to be addressed more directly by models that, traditionally, are mainly considering topical relevance. On the other hand, advances in AI, NLP, and bibliometric-enhanced IR offer promising directions for filtering, ranking, and contextualising scholarly information. In this talk, we will examine the evolving problem of fake science and its role in driving information overload. We will outline recent developments in scholarly information access, highlight open research problems — from detecting low-quality and fraudulent content to designing veracity-aware retrieval and recommendation models — and discuss how IR research can contribute to ensuring that high-quality knowledge remains discoverable, trustworthy, and actionable in an era of overwhelming information abundance.

Bio (Ingo Frommholz):
Ingo Frommholz, PhD, is Professor and Head of the School of Applied Data Science at Modul University Vienna, Austria. His research focuses on interactive information retrieval, quantum-inspired models, AI and deep learning, natural language processing, retrieval-augmented generation, and bibliometric-enhanced retrieval, with applications ranging from scholarly information access, digital humanities and scientometrics to cyberstalking detection. He has been Principal Investigator of major international projects such as the EU Horizon Europe OMINO project on information overload and the EU H2020 QUARTZ project on quantum-inspired information access. Ingo is Chair of the BCS Information Retrieval Specialist Group in the UK, Senior Managing Editor of the International Journal on Digital Libraries (Springer), and serves on the steering committees of leading ACM conferences including CIKM and SIGIR-ICTIR. He has published more than 100 scholarly works, supervised and examined PhD students across Europe, and is a Fellow of the BCS, the UK Chartered Institute for IT (FBCS) as well as Advance HE (FHEA) in the UK.

Bio (Libo Ren):
Libo Ren, MSc, is a PhD student in Applied Data Science at Modul University Vienna. Her research interests include large language models, information retrieval systems, deep learning, and multimodal AI, with applications in scholarly knowledge access and medical domains. Her PhD research aims to explore how retrieval-augmented generation and quality-aware recommendation systems can mitigate information overload. This includes developing paper quality metrics and detecting fake science to support researchers in identifying reliable knowledge, while reducing hallucinations caused by LLMs referencing low-quality scientific content. Before starting her PhD, she achieved distinction-level academic performance during her Master’s and undergraduate studies and has contributed to journal and conference publications, particularly in medical AI and LLM applications.