Publications

The repliCATS project team currently have a number of publications in progress and under review. As the pre-prints or final published versions of these papers become available, we will update them here.

List of papers & pre-prints:

Systematizing Confidence in Open Research and Evidence (SCORE)

Pre-print: https://osf.io/preprints/socarxiv/46mnb

Authors (alphabetic)

Nazanin Alipourfard Beatrix Arendt Daniel M. Benjamin Noam Benkler Michael Bishop Mark Burstein Martin Bush James Caverlee Yiling Chen Chae Clark Anna Dreber Almenberg Tim Errington Fiona Fidler Nicholas Fox [SCORE] Aaron Frank Hannah Fraser Scott Friedman Ben Gelman James Gentile C Lee Giles Michael B Gordon Reed Gordon-Sarney Christopher Griffin Timothy Gulden Krystal Hahn Robert Hartman Felix Holzmeister Xia Ben Hu Magnus Johannesson Lee Kezar Melissa Kline Struhl Ugur Kuter Anthony M. Kwasnica Dong-Ho Lee Kristina Lerman Yang Liu Zachary Loomas [SCORE] Bri Luis [SCORE] Ian Magnusson Olivia Miske Fallon Mody Fred Morstatter Brian A. Nosek Elan Simon Parsons David Pennock Thomas Pfeiffer Jay Pujara Sarah Rajtmajer Xiang Ren Abel Salinas Ravi Kiran Selvam Frank Shipman Priya Silverstein Amber Sprenger Anna Ms Squicciarini Steve Stratman Kexuan Sun Saatvik Tikoo Charles R. Twardy Andrew Tyner Domenico Viganola Juntao Wang David Peter Wilkinson Bonnie Wintle Jian Wu

Abstract

Assessing the credibility of research claims is a central, continuous, and laborious part of the scientific process. Credibility assessment strategies range from expert judgment to aggregating existing evidence to systematic replication efforts. Such assessments can require substantial time and effort. Research progress could be accelerated if there were rapid, scalable, accurate credibility indicators to guide attention and resource allocation for further assessment. The SCORE program is creating and validating algorithms to provide confidence scores for research claims at scale. To investigate the viability of scalable tools, teams are creating: a database of claims from papers in the social and behavioral sciences; expert and machine generated estimates of credibility; and, evidence of reproducibility, robustness, and replicability to validate the estimates. Beyond the primary research objective, the data and artifacts generated from this program will be openly shared and provide an unprecedented opportunity to examine research credibility and evidence.

Predicting reliability through structured expert elicitation with repliCATS (Collaborative Assessments for Trustworthy Science)

Pre-print: https://osf.io/preprints/metaarxiv/2pczv/

Authors

Hannah Fraser Martin Bush Bonnie Wintle Fallon Mody Eden Smith Anca Hanea Elliot Gould Victoria Hemming Daniel Hamilton Libby Rumpff David Wilkinson Ross Pearson Felix Singleton Thorn raquel Ashton Aaron Willcox Charles Gray Andrew Head Melissa Ross Rebecca Groenewegen Alexandru Marcoci Ans Vercammen Timothy Parker Rink Hoekstra Shinichi Nakagawa David Mandel Don van Ravenzwaaij Marissa McBride Richard O. Sinnott Peter Vesk Mark Burgman Fiona Fidler

Abstract

Replication is a hallmark of scientific research. As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce a new technique to evaluating replicability, the repliCATS (Collaborative Assessments for Trustworthy Science) process, a structured expert elicitation approach based on the IDEA protocol. The repliCATS process is delivered through an underpinning online platform and applied to the evaluation of research claims in social and behavioural sciences. This process can be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period. Pilot data suggests that the accuracy of the repliCATS process meets or exceeds that of other techniques used to predict replicability. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to assist with problems like understanding the limits of generalizability of scientific claims. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.

aggreCAT: An R Package for Mathematically Aggregating Expert Judgments

Pre-print: https://osf.io/preprints/metaarxiv/74tfv/

Authors

Elliot Gould Charles T. Gray Rebecca Groenewegen Aaron Willcox David Peter Wilkinson Hannah Fraser Rose E. O’Dea

Abstract

Structured protocols, such as the IDEA protocol, may be used to elicit expert judgments in the form of subjective probabilities from multiple experts. Judgments from individual experts about a particular phenomena must therefore be mathematically aggregated into a single prediction. The process of aggregation may be complicated when uncertainty bounds are elicited with a judgment, and also when there are several rounds of elicitation. This paper presents the new R package \pkg{aggreCAT}, which provides 22 unique aggregation methods for combining individual judgments into a single, probabilistic measure. The aggregation methods were developed as a part of the Defense Advanced Research Projects Agency (DARPA) ‘Systematizing Confidence in Open Research and Evidence’ (SCORE) programme, which aims to generate confidence scores or estimates of ‘claim credibility’ for 3000 research claims from the social and behavioural sciences. We provide several worked examples illustrating the underlying mechanics of the aggregation methods. We also describe a general workflow for using the software in practice to facilitate uptake of this software for appropriate use-cases.

Mathematically aggregating experts’ predictions of possible futures

Pre-print: https://osf.io/preprints/metaarxiv/rxmh7/ 

Authors

Anca Hanea David Wilkinson Marissa McBride Aidan Lyon Don van Ravenzwaaij Felix Singleton Thorn Charles Gray David Mandel Aaron Willcox Elliot Gould Eden Smith Fallon Mody Martin Bush Fiona Fidler Hannah Fraser Bonnie Wintle

Abstract

Experts are often asked to represent their uncertainty as a subjective probability. Structured protocols offer a transparent and systematic way to elicit and combine probability judgements from multiple experts. As part of this process, experts are asked to individually estimate a probability (e.g., of a future event) which needs to be combined/aggregated into a final group prediction. The experts’ judgements can be aggregated behaviourally (by striving for consensus), or mathematically (by using a mathematical rule to combine individual estimates). Mathematical rules (e.g., weighted linear combinations of judgments) provide an objective approach to aggregation. However, the choice of a rule is not straightforward, and the aggregated group probability judgement’s quality depends on it. The quality of an aggregation can be defined in terms of accuracy, calibration and informativeness. These measures can be used to compare different aggregation approaches and help decide on which aggregation produces the “best” final prediction. In the ideal case, individual experts’ performance (as probability assessors) is scored, these scores are translated into performance-based weights, and a performance-based weighted aggregation is used. When this is not possible though, several other aggregation methods, informed by measurable proxies for good performance, can be formulated and compared. We use several data sets to investigate the relative performance of multiple aggregation methods informed by previous experience and the available literature. Even though the accuracy, calibration, and informativeness of the majority of methods are very similar, two of the aggregation methods distinguish themselves as the best and worst.

Eliciting group judgements about replicability: a technical implementation of the IDEA Protocol

Link to PDF: http://hdl.handle.net/10125/70666 or https://scholarspace.manoa.hawaii.edu/handle/10125/70666

Citation: E. R. Pearson et al. “Eliciting group judgements about replicability: a technical implementation of the IDEA Protocol.” In Proceedings of the 54th Hawaii International Conference on System Sciences, (2021): 461-470.

Authors

E. Ross Pearson, Hannah Fraser, Martin Bush, Fallon Mody, Ivo Widjaja, Andy Head, David P. Wilkinson Bonnie Wintle, Richard Sinnott, Peter Vesk, Mark Burgman, Fiona Fidler

Abstract

In recent years there has been increased interest in replicating prior research. One of the biggest challenges to assessing replicability is the cost in resources and time that it takes to repeat studies. Thus there is an impetus to develop rapid elicitation protocols that can, in a practical manner, estimate the likelihood that research findings will successfully replicate. We employ a novel implementation of the IDEA (‘Investigate’, ‘Discuss’, ‘Estimate’ and ‘Aggregate) protocol, realised through the repliCATS platform. The repliCATS platform is designed to scalably elicit expert opinion about replicability of social and behavioural science research. The IDEA protocol provides a structured methodology for eliciting judgements and reasoning from groups. This paper describes the repliCATS platform as a multi-user cloud-based software platform featuring (1) a technical implementation of the IDEA protocol for eliciting expert opinion on research replicability, (2) capture of consent and demographic data, (3) on-line training on replication concepts, and (4) exporting of completed judgements. The platform has, to date, evaluated 3432 social and behavioural science research claims from 637 participants.

Predicting and reasoning about replicability using structured groups

Pre-print: https://osf.io/preprints/metaarxiv/vtpmb/

Authors

Bonnie Wintle Fallon Mody Eden Smith Anca Hanea David Peter Wilkinson Victoria Hemming Martin Bush Hannah Fraser Felix Singleton Thorn Marissa McBride Elliot Gould Andrew Head Dan Hamilton Libby Rumpff Rink Hoekstra Fiona Fidler

Abstract

This paper explores judgements about the replicability of social and behavioural sciences research, and what drives those judgements. Using a mixed methods approach, it draws on qualitative and quantitative data elicited using a structured iterative approach for eliciting judgements from groups, called the IDEA protocol (‘Investigate’, ‘Discuss’, ‘Estimate’ and ‘Aggregate’). Five groups of five people separately assessed the replicability of 25 ‘known-outcome’ claims. That is, social and behavioural science claims that have already been subject to at least one replication study. Specifically, participants assessed the probability that each of the 25 research claims will replicate (i.e. a replication study would find a statistically significant result in the same direction as the original study). In addition to their quantitative judgements, participants also outlined the reasoning behind their judgements. To start, we quantitatively analysed some possible correlates of predictive accuracy, such as self-rated understanding and expertise in assessing each claim, and updating of judgements after feedback and discussion. Then we qualitatively analysed the reasoning data (i.e., the comments and justifications people provided for their judgements) to explore the cues and heuristics used, and features of group discussion that accompanied more and less accurate judgements.