The repliCATS project

Collaborative Assessment for Trustworthy Science

We are an interdisciplinary group of researchers interested in how structured group deliberation processes can improve scientific processes and practices, from post-publication peer review to improving the generalizability of research to global South contexts.

The repliCATS project is part of the MetaMelb research group at the University of Melbourne, co-led by Professor Fiona Fidler & Professor Simine Vazire.

SCORE program (2019-2022)

For the DARPA Score program, we crowdsourced evaluations of the credibility of 4000 published research articles in eight social science fields: business research, criminology, economics, education, political science, psychology, public administration, and sociology.

Through that process, we were exploring the possibility of peer review as a structured deliberation process.

In 2020, we completed assessing the replicability of 3000 published claims from eight social & behavioural science fields. Our participants groups achieved 73% classification accuracy for replicated claims (or an AUC>0.7). This was phase 1.

In 2021, we began phase 2 of the SCORE program. In this phase of research our focus on evaluating a broader set of “credibility signals”—from transparency and replicability, to robustness and generalisability. Data collection for this phase is now complete. We hope to share our results with you in 2022.

Learn more about our project and find out what’s next:

Why is it important to gather predictions about the credibility of published research? 

Over the last decade several replication projects in the social and behavioural sciences have raised concerns over the reliability of the published scientific evidence base in those fields. Those replication efforts, which can include hundreds of researchers re-running entire experiments, are illuminating but highly resource intensive and difficult to scale.

Elicitation methods that result in accurate evaluations of the replicability—or more generally, credibility—of research can alleviate some of this burden, and help evaluate a larger proportion of the published evidence base. Once tested and calibrated, these elicitation methods could themselves be incorporated into peer review systems to improve evaluation before publication.

“If we can accurately predict credible research, our project could transform how end-users – from academics to policy makers – can assess the reliability of social scientific research.”

–– Prof Fiona Fidler, chief investigator of repliCATS project

The repliCATS project is part of a research program called SCORE, funded by DARPA, that eventually aims to build automated tools that can rapidly and reliably assign confidence scores to social science research claims.

Our approach

Participants at AIMOS2019 workshop in MelbourneThe “CATS” in repliCATS stands for Collaborative Assessment for Trustworthy Science.

The repliCATS project uses a structured iterative approach for gathering evaluations of the credibility of research claims. The method we use is called the IDEA protocol, and we have a custom-built cloud-based platform we use to gather data.

For each claim being evaluated, four or more participants in a group first Investigate the claim and provide an initial set of private judgements, together with qualitative reasons behind their judgments. Group members then Discuss, provide a second, private Estimate in light of discussion, and the repliCATS team Aggregates individual judgements using a diverse portfolio of mathematical methods, some incorporating characteristics of reasoning, engagement and uncertainty.

Phase 1 results

In Phase 1, which ran from Feb 2019 – November 2020, we had over 550 participants evaluate 3000 claims using our repliCATS platform, in a series of workshops and monthly remote assessment rounds. Another SCORE team, the Center for Open Science, independently coordinated direct replications and data analytic reproductions for a subset of the 3000 claims.

As of February 2021, results for 60 replication and reproductions have been reported by the Center for Open Science. Our top two performing aggregation methods achieved an AUC >0.75 or a classification accuracy of 73.77%. As results come through, we will continue to update this figure.

In September 2020, we also ran two week-long assessment workshops assessing 100 COVID-19 pre-prints. Each pre-print was independently assessed by three groups of varying experience. Once replication outcomes for these claims are available, we can conduct informative cross group comparisons, and explore differences in the accuracy of elicited predictions. We will share these results when we can.

Phase 2 – expanding our focus to consider a suite of “credibility signals”

In Phase 1 our focus was on gathering judgements of replicability for a single published claim in a paper.

In Phase 2, we crowdsourced holistic evaluations for two hundred papers. When evaluating each paper, our IDEA groups evaluated what we identified as a set of seven “credibility signals”—comprehensibility, transparency, plausibility, validity, robustness, replicability and generalisability—before making an overall credibility judgement.

We ran a series of workshops in 2021 to evaluate these 200 papers, from June – November. We hope to have further results to share in 2022.

About us

The repliCATS project is led by Prof Fiona Fidler. We are a group of interdisciplinary researchers from the School of BioSciences, School of Historical and Philosophical Studies, and the Melbourne School of Engineering at the University of Melbourne, with collaboration from the Centre for Environmental Policy at Imperial College London.

To meet the team, check out “our team” page.

Leave a Reply

Your email address will not be published. Required fields are marked *