TMCF 2024

Navigating the garden of forking paths in data-driven science

7th February 2025

Huge thanks to the Turing Team


Our TMCF:
Forking paths and data-driven science

What does good* data-driven science look like?


——————–

*defensible | rigorous | theoretically-informed

What does good* data-
driven science look like?


——————–

*defensible | rigorous | theoretically-informed


And how do we get there?


> explore ~ EDA
> pre-register
> test ~ CDA


> explore ~ EDA
> pre-register
> test ~ CDA



Many of the human errors we point out in data analysis can be attributed to a lack of ability to entertain multiple possibilities. We like to suppress and reduce uncertainty, not maintain it as we go.

Jessica Hullman, 2024



Exploratory research needs rigor to serve its intended aim of facilitating scientific discovery. Whichever method is selected […] it needs to be implemented rigorously to maximize the probability of true discoveries while minimizing the probability of false discoveries.

Devezer et al. 2021

Rigour

– Repeatable analysis plans
– Known inference protocols
– Expectations for size and stability of effects











Richness

– Evolving analysis plans
– Informal inference (multiplicity)
– Effects situated within descriptive context


Our TMCF:
Workshop and outputs

tmcf themes

1. Modelling paradigms
Establish what is distinctive about modelling in data-driven science by mapping out archetypal data-driven projects and the analysis practices they use.

2. Inference and replicability
How to plan and stage exploratory analysis? A grammar for structuring exploratory research findings so that inferences can be reported.

3. Tools and Enablers
How to enable exploratory research practices that are rich yet have rigour?
frameworks | techniques | analysis environments.

tmcf staging

Before
> Provocations
> 500-word position statements

During
> Testing position statements
> 1500-word blogs
> concept paper

After
> concept paper: [wide] data analysis
> call to action for [wide] applications

provocations

1. Modelling paradigms
Heuristics trumps theory in data-driven research
Models are exploratory artefacts

2. Inference and replicability
Claims to knowledge can only be made through out-of-sample significance tests
Pre-registration locks researchers into facile statistical tests
Human-in-the-loop is incompatible with inferential and replicable analysis

3. Tools and Enablers
Visualizations are limited as evidence
There is no formal beginning, process or an end to an interactive data analysis session, it is all context-dependent
Provenance of exploratory data analysis processes are too complex and ad hoc to be useful

provoking

provoking: quotes

blogging

panelling

jogging

When Activity
Introductions
1330-1345 TMCFs + Turing
– Andrew Duncan 
1345-1400 Our TMCF
– Cagatay Turkay, Roger Beecham
Panel + talks
1400-1445 Talk1
– Jo Wood
1400-1445 Talk2
– Rachel Franklin
1400-1445 Talk3
– Hadley Wickham
1400-1445 Panel
– All
1445-1515 Break
Outputs + call-to-action
1515-1530 wide data analysis
– Cagatay Turkay, Roger Beecham
1530-1545 Call-to-action and SI
– Cagatay Turkay, Roger Beecham
1545--1620 Workshop discussion
– All
1620-1630 Close
– Cagatay Turkay, Roger Beecham

workshop
agenda

panel + talks


Communication is Design is Analysis is Communication
– Jo Wood


Locating the entrance to the garden of forking paths
– Rachel Franklin

wide Data Analysis
——————–
[E]nabling
[W]ide
[D]etailed and
[I]nteractive
data practice

wide data analysis

wide data analysis


  • Bring together diverse theories, methods, practices and traditions from across disciplines, e.g., statistics, political science, geography, computer science, etc.

  • An opportunistic overview/review of existing (good*) practice of modelling and data analysis

  • Consolidate these over a harmonising concept and provide usable/actionable building blocks

  • And a call-to-action for further research and development of enablers

wide data analysis

[W]idening

[W]idening the data analysis process involves considering a broader range of approaches than might otherwise be adopted. The aim is to encourage a mindset that is open to exploration, interpretation, multiple and complementary explanations []… They may be specific variations in some modelling parameterisation or more profound choices around methodological approach.

wide data analysis

[I]nteracting

[I]nteracting […] embraces the potential of human decision-making in every step of the data analysis workflow. We broaden its scope […] to include all points where a human decision shapes and responds to the analytic process. This might be in the parameterisation of a model, the choice of data source, the synthesis of results – a touchpoint in the workflow that ties analysis to its underlying goals.

wide data analysis

[D]etailing

[D]etailing […] providing explicit accounts of the data analysis process in order to support scrutiny and interpretation and epistemological reflection. […] [D]etail may include rich descriptive documentation of process and context, justifications of choices made, interpretations of results and reflections on the workflow.

wide data analysis

[E]nabling

[E]nabling involves making the adoption of widening, interacting and detailing strategies as easy as possible. […] wide data analysis may not be implemented in practice if the cost of doing so is perceived as too high. In this paper we propose […] mechanisms by which we might enable more rigorous data analysis and provide a call to action to develop new enabling tools and practices.

wide foundations

Ways of thinking wide

wide foundations

Ways of doing wide

wide prompting questions

wide prompting questions











Wood et al. (2019) Design Exposition with Literate Visualization,
IEEE Transactions on Visualization and Computer Graphics

wide case study

wide call-to-action


> Case studies
> Methods
> Tools

wide call-to-action


EoI for papers
– Spring 2025

When Activity
Introductions
1330-1345 TMCFs + Turing
– Andrew Duncan 
1345-1400 Our TMCF
– Cagatay Turkay, Roger Beecham
Panel + talks
1400-1445 Talk1
– Jo Wood
1400-1445 Talk2
– Rachel Franklin
1400-1445 Talk3
– Hadley Wickham
1400-1445 Panel
– All
1445-1515 Break
Outputs + call-to-action
1515-1530 wide data analysis
– Cagatay Turkay, Roger Beecham
1530-1545 Call-to-action and SI
– Cagatay Turkay, Roger Beecham
1545--1620 Workshop discussion
– All
1620-1630 Close
– Cagatay Turkay, Roger Beecham

workshop
agenda