Binscatter and adaptive decision tree

Matias D. Cattaneo (Princeton University, USA)

Friday 24th March, 2023 15:00-16:00 JOSEPH BLACK:C407 AGRICULT

Abstract

Part 1

Binned scatter plots, or binscatters, have become a popular and convenient tool in applied microeconomics for visualizing bivariate relations and conducting informal specification testing. However, a binscatter, on its own, is very limited in what it can characterize about the conditional mean. We introduce a suite of formal and visualization tools based on binned scatter plots to restore, and in some dimensions surpass, the visualization benefits of the classical scatter plot. We deliver a comprehensive toolkit for applications, including estimation of conditional mean and quantile functions, visualization of variance and precise quantification of uncertainty, and formal tests of substantive hypotheses such as linearity or monotonicity, and an extension to testing differences across groups. To do so we give an extensive theoretical analysis of binscatter and related partition-based methods, accommodating nonlinear and potentially nonsmooth models, which allows us to treat binary, count, and other discrete outcomes as well. We also correct a methodological mistake related to covariate adjustment present in prior implementations, which yields an incorrect shape and support of the conditional mean. All of our results are implemented in publicly available software, and showcased with three substantive empirical illustrations. Our empirical results are dramatically different when compared to those obtained using the prevalent methods in the literature.

Part 2

Decision tree learning is increasingly being used for pointwise inference. Important applications include causal heterogenous treatment effects and dynamic policy decisions, as well as conditional quantile regression and design of experiments, where tree estimation and inference is conducted at specific values of the covariates. In this paper, we call into question the use of decision trees (trained by adaptive recursive partitioning) for such purposes by demonstrating that they can fail to achieve polynomial rates of convergence in uniform norm, even with pruning. Instead, the convergence may be poly-logarithmic or, in some important special cases, such as honest regression trees, fail completely. We show that random forests can remedy the situation, turning poor performing trees into nearly optimal procedures, at the cost of losing interpretability and introducing two additional tuning parameters. The two hallmarks of random forests, subsampling and the random feature selection mechanism, are seen to each distinctively contribute to achieving nearly optimal performance for the model class considered.

Biosketch

Matias D. Cattaneo is a Professor of Operations Research and Financial Engineering (ORFE) at Princeton University, where he is also an Associated Faculty in the Department of Economics, the Center for Statistics and Machine Learning (CSML), and the Program in Latin American Studies (PLAS). His research spans econometrics, statistics, data science and decision science, with particular interests in program evaluation and causal inference. Most of his work is interdisciplinary and motivated by quantitative problems in the social, behavioral, and biomedical sciences. As part of his main research agenda, he has developed novel semi-/non-parametric, high-dimensional, and machine learning inference procedures with demonstrably superior robustness to tuning parameter and other implementation choices. Matias was elected Fellow of the Institute of Mathematical Statistics (IMS) in 2022. He also serves in the editorial boards of the Journal of the American Statistical Association, Econometrica, Operations Research, Econometric Theory, the Econometrics Journal, and the Journal of Causal Inference. In addition, Matias is an Amazon Scholar, and has advised several governmental, multilateral, non-profit, and for-profit organizations around the world.

 

Add to your calendar

Download event information as iCalendar file (only this event)