Inference and replicability in data-driven science: capturing context
A familiar challenge in data-driven science is that of inference. Generalising away from a sample to some target population is challenging when datasets are already large and population-level and observations subject to unknown confounding context. Replication is muddled by the number of analysis choices that are to be made and range of alternatives that could be pursued (the forking paths problem). How, then, can we ensure that the inferences and claims we make from exploratory analyses are properly contextualised?
Drawing on expertise in Statistics, Human Computer Interaction and Spatial Econometrics, we will explore ways of systematically describing context in interactive data analysis. An outcome of this work may be to formulate a grammar for structuring exploratory research findings. For a stated finding, the grammar would require researchers to identify, using theory and prior knowledge: confounding context that is present/absent in their analysis, the spatiotemporal scale at which their stated finding is observed and how it may generalise outside of this; the weight of evidence in support of some theory implied by a finding relative to other plausible theories.