repliCATS glossary

Glossary for Participants

Updated: 3 June 2021.

Table of contents

Terms relating to the research project

Terms used on the repliCATS platform – phase 1 and phase 2 (single trace & bushel)

Terms relating to the reliability of research claims in the social and behavioural sciences

Terms specifically related to statistical concepts

  • p values and statistical significance
  • Type 1 and Type 2 errors
  • Effect sizes
  • Cohen’s d
  • Correlation coefficients (and related measures)
  • Partial Eta squared
  • Confidence Intervals

Terms relating to the research project

SCORE

The overarching research project funded by the US agency DARPA (Defence Advanced Research Project Agency), which aims to develop automated ways of assessing confidence in research claims within the Social and Behavioural Sciences. The University of Melbourne repliCATS team is undertaking one component of the SCORE research program. Other teams from around the world are undertaking other components of the project.

Single trace

A single research claim extracted from a paper. In phase 1, repliCATS evaluated 3000 single trace claims. In phase 2, we will evaluate 900 single trace claims.

Bushel

A selection of 200 papers that will be evaluated more holistically. In Phase 2 of SCORE we will evaluate multiple dimensions of credibility of a particular piece of research.

repliCATS

The team at the University of Melbourne that is participating in the SCORE research program. The repliCATS project is designed to elicit expert judgements on the reliability of research claims within the Social and Behavioural Sciences, to aggregate them into useful measures of reliability, and to understand the reasoning behind these judgements.

IDEA protocol

The process used by the repliCATS team to elicit judgements on the reliability of research claims within the Social and Behavioural Sciences from a diverse group of knowledgeable individuals. IDEA stands for “Investigate”, “Discuss”, “Estimate” and “Aggregate”. In practice for participants the IDEA protocol involves three stages: Considering a claim and making a first-round judgement; then discussing these judgements within a small team; finally making a second-round judgement. The IDEA protocol has been found to improve judgements under uncertainty.


Terms used in the online platform

Research Claim (single trace) or Result (bushel)

A single major finding from a published study (for example, a journal article), as well as details of the methods and results that support this finding. A Research Claim is not equivalent to an entire article. Sometimes the claim as described in the abstract does not exactly match the claim that is tested. In this case, you should consider the Research Claim to be that which is described in the inferential test. In phase 1 and for single trace claims in phase 2 of SCORE will focus on testing the replicability of the test results only.

Central Claim (bushel)

This is the main research finding, usually described in the abstract and highlighted in the paper’s conclusion. It may not be associated with a unique finding or single statistical test.

Credibility

When referring to the paper as a whole, Credibility refers to the the trustworthiness of a piece of research. This generic description accounts for the fact that everyone is likely to have slightly different understandings about what Credibility means. That is okay – there is no one right way of thinking about this. You can think of it in this way: How likely would you be to use this paper if you were doing research in the same field, or how likely would you apply it to decisions if you were basing policy on it. However, you might have other ways of thinking about it and multiple factors may feed into your personal understanding of Credibility. We explicitly accept this, and the platform is set up for you to express your individual understanding of what credibility means to you in this context.

Generalizability

Closely related to Conceptual Replications, in which researchers deliberately alter important features of a study to generalize findings or to test the underlying hypothesis in a new way. However, it is important to note that a common view on generalizability is that it can only be established of a research claim if the relevant study was repeated multiple times, each with different variations in experimental procedures or theoretical assumptions (e.g., changing the population studied, the type of measurement or technique used and so on). In that sense, Generalizability means to operationalize the experiment and analysis differently, use new data, and get largely the same result (e.g., using a different genetics, proteomics, or imaging platform or translating a questionnaire to a different language and doing a survey in a different country).

Plausibility

The likelihood that you would assign the claim based on your background knowledge and experience, before considering the details of the experiment i.e. prior plausibility. We know the word ‘plausible’ will mean different things to different people. The word ‘plausible’ means different things to different people. For some people, almost everything is ‘plausible’, while other people have a stricter interpretation. You could also consider words like ‘possible’ or ‘realistic’ here. We ask you to maintain a consistent standard between different claims and try to let us know if some claims have a higher prior plausibility for you than others. See also the entry in the Training document.

Replication

An independent repeat of an experiment with a specified degree of similarity to the methodological and/or analytic procedures documented in an original study. Replications are typically divided into “direct” (with higher degrees of similarity) and “conceptual” (with lower degrees of similarity). This is not a sharp division. Whether a replication is considered direct or conceptual depends on a range of things including the extent to which the theoretical context of the claim is understood and any use-context for the Research Claim.

Close Replication

A Replication that follows the methods of the original study with a high degree of similarity, varying aspects only where there is a high degree of confidence that they are not relevant to the research claim. The aim of a direct replication is to improve confidence in the reliability and validity of an experimental finding by starting to account for things such as sampling error, measurement artefacts, and questionable research practices.

Conceptual Replication

A Replication that involves independently repeating an original experiment while purposefully altering specific aspects of the original methods (i.e. a research group is able to investigate the claim using methods of their choice). The aim of conceptual replications is to test whether the claim is supported when using different methods, and the extent to which a research claim can generalize to new circumstances. Within the context of SCORE, one type of conceptual replication is of specific interest: data-analytic replication.

Data analytic replication

A Replication that involves analysis of an original claim using a different, pre-existing dataset from a similar time period with similar measures.

Robustness

Refers to the stability and reliability of a Research Claim. A robust result is getting largely the same result using a different analysis on the original dataset.

Validity

The extent to which the conclusions of a study is correctly inferred from the evidence presented. In repliCATS Bushel claims, we are concerned with three different sub-domains of validity, see below:

Design validity

The extent to which the conclusions of a study can be inferred from the reported evidence, given the study design. Design validity is achieved if the methods and procedures are suitable for addressing the research question. For instance, it would be inappropriate to draw causal inference between two variables in a correlational (rather than experimental) design.

Analytic validity

The extent to which a set of statistical inferences, and their underlying assumptions, are appropriate and justified, given the research hypotheses and the (type of) data.

Conclusion validity

The extent to which the paper’s conclusions are warranted given the evidence presented. Conclusion validity is achieved when the conclusion is well-calibrated, meaning, the stated interpretation stays within the bounds of what is indicated by the results reported and does not make any unwarranted claims.

Transparency

The clarity and comprehensiveness of the description of the methods, materials, procedures and analyses of a study. Full transparency means the paper provides all relevant information that would enable another researcher to conduct a close replication of the original study – using exactly the same techniques and analyses.


Terms relating to the reliability of research claims in the social and behavioural sciences

Social and Behavioural Sciences

Defined within the SCORE research project as research that appears in a specific list of journals. The disciplines that these journals cover include psychology, economics, political science, sociology, law, education, business and marketing.

Conceptual Replication

A Replication that involves independently repeating an original experiment while purposefully altering specific aspects of the original methods (i.e. a research group is able to investigate the claim using methods of their choice). The aim of conceptual replications is to test whether the claim is supported when using different methods, and the extent to which a research claim can generalize to new circumstances. The SCORE project is interested in whether claims are conceptually replicable, but the main task for the repliCATS team is to determine whether claims are directly replicable.

Publication bias

Refers to the bias against publishing statistically non-significant, or negative, results. This bias comes from both editors and reviewers, and from authors, self-selecting out of publishing non-significant results because of anticipated rejection. The effect of publication bias is to inflate the number of statistically significant results in the published literature compared with the number in studies actually performed. In heavily biased literatures we expect a higher rate of false positives than the baseline. See also the entry in the Training document.

Questionable Research Practices (QRPs)

A range of practices that are relatively common in experimental research, but which affect the interpretation of statistical results, typically in such a way as to overstate the reported effect. Examples of QRPs include p-hacking (making decisions about data collection and analysis after checking for statistical significance), cherry picking (failing to report non-statistically significant relationships that were tested), and Hypothesizing After Results are Known (HARKing, presenting ad hoc findings as though they had been predicted all along).

Private or Personal Knowledge

Knowledge about a research claim that is not contained within the public literature, such as one’s own experience with undertaking similar research, or one’s prior assessments of the quality of work from a particular source.


Terms specifically related to statistical concepts

Statistical concepts are particularly relevant to the question of whether a claim will replicate. Moreover, there are many misconceptions about the meanings of such concepts, even by practising researchers, and many misapplications of their use within the literature. We will not attempt to provide concise descriptions of these terms. We have provided a separate training document that details more precise meanings and proper uses of important terms, and references to the literature. This training document also contains some background information on questionable research practices and replication rates in previous studies. This training document can be downloaded here.

The training document covers the following terms, amongst others:

  • p values and statistical significance
  • Type 1 and Type 2 errors
  • Effect sizes
  • Cohen’s d
  • Correlation coefficients (and related measures)
  • Partial Eta squared
  • Confidence Intervals