Retrieving and managing air quality data at the European level: the EEAaq package and the need to manage missing data

Paolo Maranzano (University of Milano Bicocca)

Wednesday 19th June 13:00-14:00 Maths 311B

Abstract

In this talk we discuss the EEAaq software, an R package developed to download, manage and analyze air quality data at the European level from the European Environment Agency (EEA) dataflows. The software (release 0.0.3) is freely available on the R CRAN since August 2023. EEAaq addresses several issues: (1) the EEA air quality download system and the metadata retrieving lacks in practicality and flexibility for non-professionals users; (2) direct collection of data from the agency’s portal requires heavy data manipulation; (3) air quality conditions in Europe are continuously raising considerable interest from researchers and technicians involved in policy evaluation. The EEAaq package provides the users with a set of functions, which can be re-grouped into three categories according to their goal: 1) download, 2) summarize and aggregate data, and 3) build static and dynamic maps. The download functions allow the users to specify either LAU or NUTS-level zone information, a specific shapefile, or a list of coordinates representing the area for which to retrieve the respective air quality data. The summary functions allow for the computation of descriptive statistics, data information, and time aggregation. The mapping functions aim to represent the monitoring stations and to build spatial interpolation maps. Data provided by the EEA suffer from poor comparability due to the heterogeneity of national and regional agencies. In fact, depending on the countries, pollutants may be measured at different frequencies or even not measured at all. Another serious problem is the high presence of missing values and holes in the collected time series. To address this issue, a variety of algorithms are being developed in the EEAaq package that provide estimates and imputations of missing values by exploiting the multiple seasonality properties (intraday, weekly, and annual) of the pollutants. In particular, variants of the Site-Dependent Effect method (Plaia & Bondì, 2006), which takes into account only the temporal dynamics of the data, are proposed in which they 1) explicitly model the spatial correlation between pollutant stations; 2) model the potential spatial heterogeneity between time series; and 3) model the positive asymmetry that typically characterizes pollutant concentrations. The imputation algorithms are evaluated through a simulation study based on the actual atmospheric monitoring network installed in Europe in 2023.

Add to your calendar

Download event information as iCalendar file (only this event)