Rigor, imagination and production in data-driven science
The steady expansion in the availability and reach of observational data has prompted much-needed introspection into data analysis practice. When using data to answer questions, it is not simply a case of choosing from a set of appropriate statistical procedures, or ‘rolling out’ some research design template. Effective data analysis requires analysts to make decisions within a wide space of analysis options and to engage deeply with the processes and mechanisms being represented through data. This public lecture features three internationally-standout scientists from academia and industry. Talks will cover how to challenge and interrogate in data-based research; techniques for imagining uncertainty and variation in observational data; and how data-driven analyses can be put into production.
- Date/Time : Thursday 20th June, 1730-2100
- Venue : King’s College London
- Details : Drinks reception following talks
Andrew Gelman
Beyond the black box: Toward a new paradigm of statistics in science
Standard paradigms for data-based decision making and policy analysis fail, and have led to a replication crisis in science, because they can’t handle uncertainty and variation and because they don’t seriously engage with the quality of evidence. We discuss how this has happened, touching on the piranha problem, the butterfly effect, the magic number 16, the one-way-street fallacy, the backpack fallacy, the Edlin factor, Clarke’s law, the analyst’s paradox, and the greatest trick the default ever pulled. We then discuss ways to go beyond the push-a-button, take-a-pill model to a more active engagement of data in science.
Andrew Gelman is Professor of Statistics and Political Science at Columbia University. His research in applied statistics is wide-ranging within and beyond Political Science. He has received the Outstanding Statistical Application award three times from the American Statistical Association, the award for best article published in the American Political Science Review, the Mitchell and DeGroot prizes from the International Society of Bayesian Analysis, and the Council of Presidents of Statistical Societies award.
Jessica Hullman
Data Analysis as Imagination
Learning from data, whether in exploratory or confirmatory analysis settings, requires one to reason about the likelihood of many competing explanations. However, people are boundedly rational agents who often engage in pattern-finding at the expense of recognizing uncertainty or considering potential sources of heterogeneity and variation in the effects they seek to discover. Taking this seriously motivates new classes of interface tools that help people extend their imagination in hypothesizing and interpreting effects.
Jessica Hullman is Ginni Rometty Associate Professor of Computer Science at Northwestern University. Her research addresses challenges and limitations that arise when people draw inductive inferences from data. Her work has contributed visualization and interaction techniques decision-making and analysis, as well as theoretical frameworks for understanding the role of visualization in statistical workflow. Jessica’s work has been awarded best paper awards at top visualization and HCI venues, a Microsoft Faculty award, and NSF CAREER, Medium, and Small awards as PI, among others.
Hadley Wickham
Data science in production
This talk will discuss what it means to put data science “in production”. In industry, any successful data science project will be run repeatedly for months or years, typically on a server that you can’t work with interactively. This poses an entirely new set of challenges that you won’t encounter in your classes in university, but are vital to overcome if you want to have an impact in your job.
In this talk, I’ll discuss three principles that I’ve found useful for understanding data science in production: not just once, not just my computer, and not just by myself. I’ll discuss the challenges associated with each, and where possible, what solutions (both technical and sociological) are currently available.
Hadley Wickham is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, <http://hadley.nz>
.