Resources for participants

resources for repliCATS participants

Version: last updated on 20 July 2021

On this page you can find information about:


logging on to the platform in phase 2

Click here to access the repliCATS bushel platform. If you are a new user, you will have received instructions to create an account. If you are having trouble with the platform, contact us via: repliCATS-project@unimelb.edu.au

Using the repliCATS bushel platform

 

assessing bushel papers in phase 2 (workshops)

Assessing a bushel paper using the repliCATS bushel platform involves considering both features of the overall paper, as well as some specific evidence claims within the paper.

We use a structured elicitation process called the IDEA protocol, so you will have an opportunity to read the paper and make an initial private assessment for each question (we call this round 1). Your group will then have an opportunity to review and discuss your round 1 judgements. You can then revise or update your judgements (we call this round 2).

In this part of the project, we are trying to assess the credibility of scientific papers as a whole. There are many dimensions to credibility and so ask you questions about different aspects of the paper. There are nine questions in total: Questions 1-8 refer to paper-level assessments. Following that, you will be asked to consider a number of specific results presented in each paper.

Use the section ‘answering questions’ below to browse and search the bushel guide in an interactive format. You can also download the full guide to assessing bushel papers as well as the one-page participant cheat sheet below.

Missed a workshop intro? Here’s our playlist of videos for you to catch-up.

answering the questions

Paper-level questions

When you answer the paper-level questions, we want you to think about what you understand the main claim of the paper to be.

By main claim of the paper we mean the key finding which typically would be described in the paper’s abstract and highlighted in the paper’s conclusion.

However, if you just want to think about your overall impression of the paper, that is fine. We are aware that different people might form different opinions about what the central claim of a given paper is. That’s okay. You can discuss this with your group after you have completed Round 1.

Guidance for each paper-level question

Click on a question number to expand guidance for answering each question, or you can watch the short video below.

How well do you understand the paper overall?

Purpose: To understand if anything affects your ability to interpret the paper and identify its central claim. 

Clarification: We know that scientific papers vary in clarity and comprehensibility. It’s possible that a paper:

  • is vague;
  • is poorly written;
  • relies on an unfamiliar procedure;
  • contains too much jargon;
  • is unclear about exactly what the central claim is;
  • is about a concept that you are not familiar with and/or have difficulty conceptualising.

These factors can all contribute to your ability to be able to interpret the paper and its central claim and may in turn lead to different interpretations by the group. There is a comments box below this question, where you can provide a summary of the paper’s central claim, as you see it. We would like you to focus on the higher-level finding (the ‘take-home message’, if you will), not detailed results. There will be space to consider individual results later.  Sharing your interpretation will help highlight whether there are different opinions about the central claim of the paper, and this information may be useful during the Discussion.

Answering the comprehensibility question:

We’re asking for this on a scale of 0 to 100. 0 means that you have no idea what the paper means, 100 means it’s perfectly clear to you. This is not an objective measurement – try your best to estimate a number for comprehensibility, and try to be consistent between papers. 

Some papers may be outside of your main fields or use words that you are unfamiliar with. This might cause you to immediately put ‘I have no idea what the paper means’. However, with a little bit of effort, you can usually deduce what is being asked. If after completing your reading of the paper you still cannot work it out then you should definitely indicate this to us and consider whether that is indicating something about the quality of the research being described.

Comments box: We have provided a comment box below the question so you can try to rephrase what you think the paper is about and what the central claim is. This will be really useful in the Discussion phase to prompt your memory about your initial interpretation of the question. Please do not place any discussion about your assessment of the paper here - reserve such considerations for the comments box for Question 8: Credibility. Note that you can navigate to this question at any stage to add comments as you go – you don’t need to wait until you start assessing Question 8.

How consistent is the central claim of this paper with your existing belief?

Purpose: To capture your beliefs about whether the underlying effect or relationship corresponds to something real.

Clarification: Sometimes we hear a series of claims in a paper and we have a strong feeling that some or all of them do not seem very plausible either within the context of the experimental design, or more broadly (i.e. relating to a relationship that would generalise across contexts or experimental designs).

These prior beliefs can be useful. We’ve included this question here to allow you to state your prior belief about if you think there is a real effect in this study, regardless of what you think about this particular experimental/study design.

Don’t spend too much time on this question. In the next question, we want you to examine the claim and the validity of your prior beliefs more critically, as to how they relate to direct replication.

 Answering the plausibility question:

We’re asking for this on a scale of 0 to 100. 0 means that the paper is exactly contrary to your pre-existing beliefs, 100 means it’s perfectly compatible with them. As with Question 1: Comprehensibility, try to estimate a number and try to be consistent between papers.

The word ‘plausible’ means different things to different people. For some people, almost everything is ‘plausible’, while other people have a stricter interpretation. Don’t be too focused on the precise meaning of ‘plausible’ – you could also consider words like ‘possible’ or ‘realistic’ here. We just ask you to maintain a consistent standard between different claims and try to let us know if some papers are clearly more plausible (or implausible) than others. 

If you didn’t understand the paper being asked, it might be challenging to say whether you believe it’s plausible. Hopefully, the paper will become clearer in the Discussion phase. You will have an opportunity to comment on your reasons for this in question 8. (Credibility) below.

Based on your quick read of the paper, how transparent is the research described here? Think about how easy or difficult it would be for someone who wanted to evaluate or replicate the research in this paper to find all the information they need about the methods, analysis and procedures.

Purpose: To gauge your assessment on the quality and clarity of reporting in the paper

Clarification: Transparency in this context refers to a clear, unambiguous description of the methods used in the research, including experimental procedures, materials, tests and analytical techniques. Think about how easy it is to find all of the information required to perform a close replication of the study. This includes whether or not the study was pre-registered. 

Answering the transparency question:

We’re asking for this on a scale of 0 to 100. 0 means that the paper is very unclear, not at all transparent in its methods, procedures and/or analyses, 100 means it’s perfectly clear and transparent, to the point that another researcher would be able to repeat the methods, procedures and analyses without issue. Again, try to estimate a number and be consistent in the criteria you apply when assessing transparency between papers. 

We don’t expect you to thoroughly check everything – that would be time-consuming! Just make your best estimate, based on however much of the paper you have read.

Imagine an independent researcher runs a replication of this original study. What is the probability (0-100%) that a close replication of the central claim would find results consistent with the original paper?

Purpose: The question is asking about a close replication of the central claim of the paper. However, if it is easier for you to think about the average replicability of several different claims, that’s ok (because you cannot work out what you think is the central claim, for example).

Clarification:

  • A close replication is a new experiment that follows the methods of the original study with a high degree of similarity, varying only aspects where there is a high degree of confidence that they are not relevant to the research claim. People often use the term direct replication – however, no replication is perfectly direct, and we cannot describe precisely how any given claim will be replicated. Decisions about how to perform replications are made by a team of researchers that is independent from the repliCATS project. Our best advice is to imagine what kinds of decisions you would face if you were asked to replicate this study, and then to consider the effects of making different choices for these decisions. 
  • A successful replication consistent with the original paper is one that finds a statistically significant effect (defined with an alpha of 0.05) in the same direction as the original study, using the same statistical technique as the original study.

Specifically, for close replications involving new data collection, we would like you to imagine 100 (hypothetical) new replications of the original study, combined to produce a single, overall replication estimate (i.e., a good-faith meta-analysis with no publication bias). Assume that all such studies have both a sample size that is at least as large as the original study and high power (90% power to detect an effect 50-75% of the original effect size with alpha=0.05, two-sided). 

Sometimes it is clear that a close replication involving new data collection is impossible, or infeasible. In these cases, you should think of data analytic replications, in which the central claim is tested against another pre-existing dataset that provides a fair test. Again, imagine 100 datasets analysed with results are combined to produce a single, overall replication estimate.

Answering the replicability question:

In this question, we want you to try and think of reasons why the central claim may or may not replicate. We understand that your thoughts about the prior plausibility of this claim are likely to influence your judgement regarding this question. However, we’d like you to try and think more critically of other reasons why this particular study may (or may not) replicate. In answering this question, please only use whole integers – do not use decimal places. 

Understanding the three-step format

This question, and some of the following questions, ask you to provide three separate estimates: a lower bound, upper bound and best estimate. Here’s how we want you to think of those estimates in regards to replicability. Think about your assessments of Question 5: Robustness and Question 8: Credibility in a similar manner.

  • First, consider all the possible reasons why a claim is unlikely to successfully replicate. Use these to provide your estimate of the lowest probability of replication.
  • Second, consider the possible reasons why a claim is likely to successfully replicate. Use these to provide an estimate of the highest probability of replication.
  • Third, consider the balance of evidence. Provide your best estimate of the probability that a study will successfully replicate.

Some things to consider about the three-step format:

  • Providing a lower estimate of 0 means you believe a study would never successfully replicate, not even by chance (i.e. you are certain it will not replicate). Providing an upper estimate of 100 means you believe a study would never fail to replicate, not even by chance (i.e. you are certain it will replicate).
  • Providing an estimate of 50 means that you believe the weight of evidence is such that it is as likely as not that a study will successfully replicate. If you have low prior knowledge and/or large uncertainty, please use the width of your bounds to reflect this, and still provide your best estimate of the probability of replication.
  • Any answer above 50 indicates that you believe it’s more likely that the study would replicate than it would not replicate. Answers below 50 indicate that you believe it’s more likely that the study would not replicate than it would replicate. Intervals (the range between your lowest and highest estimate) which extend above and below 50 indicate that you believe there are reasons both for and against the study replicating.

There is evidence that asking you to consider your lower and upper bounds before making your best estimate improves the accuracy of your best estimate. The difference between your upper and lower estimates is intended to reflect your uncertainty about whether that claim’s findings would replicate. There’s no correct answer here, but we expect that your intervals for those claims you feel most uncertain about will be the widest.

Additional considerations on the question of replicability

There are many things you might consider when making your judgement. The IDEA protocol operates well when a diversity of approaches is combined. There is no single ‘correct’ checklist of things to assess. However, some things you may wish to consider include.

  • The statistical data, analyses and results reported within the paper, including sample size, effect size and p-value, if reported. These details are likely to be important for whether a claim replicates – see this document for more information.
  • The experimental design. Will it be reliable in replication? Are there any signs of Questionable Research Practices e.g. unusual designs where more straightforward tests might have been run but failed? Note that this question is interested in the replicability of the central claim even if the external validity of the design is low.
  • Your prior plausibility for the paper. Background probabilities are often a major factor. Is this area of research more or less well-understood?
  • Contextual information about the original study or publication such as where and when the paper was published, and who undertook the original study. Do you have any private or personal knowledge e.g. experience with undertaking similar research, or existing knowledge about the quality of work from a particular source?

Imagine an independent analyst receives the original data and devises their own means of investigating the central claim of this paper. What is the probability (0-100%) that an alternative analysis would find results consistent with the original paper?

Purpose: To capture your beliefs about the analytic robustness of the main finding.

Clarification: The term robustness is used here to represent the stability and reliability of a research finding, given the data. It might help to think about it in this way: imagine 100 analysts received the original data and devised their own means of investigating the central claim (e.g using a different, but entirely appropriate statistical model or technique), how many would find a statistically significant effect in the same direction as the original finding?

This question uses the same 3-point format as Question 4: Replicability. To assess the robustness of the central claim of the paper: 

  • First, consider all the possible reasons why a claim might not be robust, i.e. why another researcher using a different analytic approach on the original data might find a result that is inconsistent with the original claim. Use these to provide your estimate of the lowest probability of a robust finding.
  • Second, consider the possible reasons why a claim would be robust, i.e. why another researcher using a different analytic approach on the original data would find a result that is entirely consistent with the original claim. Use these to provide an estimate of the highest probability of a robust finding.
  • Third, consider the balance of evidence. Provide your best estimate of the probability that a new analysis of the original data would result in a result that is consistent with the original claim.

The meaning of consistent with the original paper” is that statistically significant findings (defined with an alpha of 0.05) in the same direction as the original paper are found using the alternative analyses.

Question 6a: Generalizability (rating)

Going beyond close replications and reanalyses, how well would the main findings of this paper generalize to other ways of researching this question?

Purpose: The question is asking about whether the main claim of the paper would hold up or not under different ways of studying the question. 

Clarification: This question is asking about generalizations of the original study or ‘conceptual replications’. We want you to consider generalizations across all relevant features, such as the particular instruments or measures used, the sample or population studied, and the time and place of the study.

Answering the generalizability rating question:

This question is asked on a scale of 0 to 100, where 0 means the central claim is not at all generalizable and 100 means it is completely generalizable.

We know that there are many different features of a given study that potentially limit generalizability, and they may have different levels of concern, so it might be tricky to work out a single rating across all of them. Do your best to assess this in whatever way seems best. 

You might want to imagine 100 (hypothetical) conceptual replications of the original study, each of which varies just one specific aspect of the study design (e.g. sample, location, operationalisation of a variable of interest,…) while holding everything else constant. Across this hypothetical set of conceptual replications, all relevant aspects of the study are varied in turn. How many of these do you estimate would produce a finding that is consistent with the finding in the original study? 

Question 6b: Generalizability (features)

Please select the feature(s), if any, that you think limit the generalizability of the findings.

Purpose: The question is asking you to list the features of the study that raise substantial generalizability concerns.

Clarification: Select the features for which you definitely have generalizability concerns. Don’t select a feature if you simply think that it is possible that the study will not generalize over that feature. 

Answering the generalizability features question:

You can select more than one feature – select all features you think raise substantial generalizability concerns. 

If there is a feature that we have not listed that you think raises substantial generalizability concerns, then select ‘Other’ and briefly describe the feature in the text box. Please do not use this text box to discuss why you think there are generalizability concerns. If you have any comments or thoughts on this, please log write them in the comments box for Question 8: Credibility.

How well-designed is the research presented in this paper to address its aims?

Purpose: The question is asking you to make a judgement about the degree to which the conclusions of the study can be inferred from the reported effect, given the study design.

Clarification: This question focuses on internal validity, or the extent to which the study measures what it claims to measure and the methods are suited to address the research aim(s). In a well-designed study, systematic errors and bias can be discounted, such that the outcome can be reliably linked to the (experimental) manipulation or variable of interest. For example, claims about causal relationships among variables need to be warranted by the evidence reported.

Answering the design validity question:

We’re asking for this on a scale of 0 to 100. 0 means that the study design is not at all suited to address the research aim(s), 100 means it’s perfectly suited to address the research aim(s). Again, try to estimate a number and be consistent in the criteria you apply when assessing the validity of the design between papers.

Question 7b: Validity (analysis)

How appropriate are the statistical tests in this paper?

Purpose: The question is asking about the extent to which a set of statistical inferences, and their underlying assumptions, are appropriate and justified, given the research hypotheses and the (type of) data.

Clarification: This question focuses on a different aspect of internal validity, namely on the extent to which the statistical models/tests are appropriate for testing the research hypotheses. For instance, assumptions may have been violated that would render the chosen test(s) inappropriate, or the statistical model of choice may not be appropriate for the type of data.

Answering the analytic validity question: 

We’re asking for this on a scale of 0 to 100. 0 means that the statistical analyses are not at all appropriate to test the research hypotheses, 100 means they are entirely appropriate to test the research hypotheses. Again, try to estimate a number and be consistent in the criteria you apply when assessing statistical validity between papers.

Question 7c: Validity (conclusions)

How reasonable and well-calibrated are the conclusions drawn from the findings in this paper?

Purpose: The question is asking about the extent to which the paper’s conclusions are warranted given the findings.

Clarification: This question relates to the stated interpretation of the findings, whether the conclusions match the evidence presented, and the limitations of the study. Sometimes a paper’s conclusion(s) might extend beyond what is indicated by the results reported. 

Answering the conclusion validity question:

We’re asking for this on a scale of 0 to 100. 0 means that the conclusion is unrelated to the evidence presented, 100 means the conclusion perfectly represents the evidence presented. Again, try to estimate a number and be consistent in the criteria you apply when assessing the validity of the conclusion between papers.

How would you score the credibility of this paper, overall?

Purpose: The question is asking about how you would assess the overall credibility of a paper, incorporating all of the dimensions that we have asked about so far as well as any other ones you think may be relevant.

Clarification: We have intentionally not specified an exact interpretation or definition of the concept of Credibility. We want you to determine what you think is credible

Answering the paper credibility question:

Advice about answering the three separate judgements (lower, upper, best estimate) was given in Question 4: Replicability above. Use that advice applied to how you are thinking of Credibility.

Everyone is likely to have a slightly different understanding of what Credibility means. That’s ok – there is no one right way of thinking about this. You might want to think about how likely you would be to use this paper if you were in the same research field, or how likely you would be to apply the finding(s) presented in the paper when making policy decisions. However, you might have other ways of thinking about it. You may wish to just average across all of the credibility dimensions we have asked you to assess (e.g. plausibility, replicability,…). However, for any given paper, some dimensions may be more important than others and you may want to weight those dimensions more strongly. We also may have failed to ask about something that you think is important for this paper, so you might include additional or completely different factors in your judgement about Credibility for that paper.

If you are able to describe how you have thought about overall Credibility, we would love you to tell us in the comments box. However, if you cannot articulate your mental model for Credibility, that’s ok too. Just do your best to make this assessment in whatever way seems best to you.

Tooltip: When giving your lower bound, give the lowest score you would feel comfortable justifying. When giving your upper bound, give the highest score you would feel comfortable justifying. If you feel very unsure about how to score it, make your interval wider.

Comments box: We have provided a text box below the question.

Please use the text box for this Question 8: Credibility to capture all of your thinking and reasoning about the credibility assessment of this paper. 

Note that you can navigate to this text box – at any stage, so you can drop comments in as you go. You don’t have to wait until you get to this question before writing your thoughts, and you can return to it while you are considering the evidence-level questions as well.

Collecting all of your reasoning in this one spot allows us to understand how you assessed the paper, which factors were important and which were less influential in determining your judgement about the credibility of this paper. This information may also prove very useful during the discussion and for others who will be able to access the (anonymous!) comments. However, there is no need to write polished prose here. As long as they can be understood, notes, partial sentences and dot points are fine. However, please do be careful to avoid ambiguities, so that your team members – and the repliCATS project – can make the best use of your comments. For example, if you mention errors or uncertainties, make it clear whether you are referring to the paper itself, the way it is presented on the platform, or how it has been discussed by your team. If you comment on specific results in the paper, make sure you describe which one.

Please do not use your own name or the name of any other participants, or any other distinguishing remark such as your affiliation or your professional position.

Judgements will eventually be made public and we must be able to keep them anonymous. If you want to refer to yourself or a team member, please only use the avatars, i.e. anonymised screen names.

Evidence-level questions

After answering the questions about the overall paper, we will ask you to assess some specific pieces of evidence within the paper. By doing this, we aim to obtain a more complete evaluation of the paper. You will be asked to consider a variable number of specific results per paper, but no more than ten for any paper. The questions you will be asked are described below. Remember to go back and use the comments box for Question 8: Credibility, if you want to comment on any of these specific pieces of evidence.

Considering the credibility of this particular result alone, it might be the same as the credibility score you gave for the overall paper or it might be different. If different, please change your rating here. We have started here with the credibility score you gave for the overall paper.

Purpose: We understand that different results reported in a given paper vary in credibility. Some results might be highly reliable, while other specific results may be less so. By asking you to separately rate a number of different pieces of evidence, we can get a sense of how much this paper varies.

Clarification: This question relates only to the specific piece of evidence that is listed, both at the top of the question pane and in the left-hand sidebar. Some of the statistical information relating to this result has been extracted for you, as well as the location of this specific result in the paper.

Answering the evidence-level credibility question:

Think about Credibility in the same way that you thought about it in Question 8: Credibility. However, here you are considering only the specific result listed, rather than the main claim of the paper or the paper overall. 

We have pre-filled this assessment with the overall credibility rating you gave for the paper. If you think that this is an ‘average’ result within the paper then it is ok to just leave the credibility rating as it is. However, if you think that this particular result is more or less credible than the paper overall then please adjust your assessment accordingly. 

Note that in Round 2 we will pre-fill your assessment with your Round 1 evidence-level credibility rating, so this is a little bit different. However, we will also remind you of your overall credibility assessment for Round 2, in case your thinking about this has changed between Rounds.

How relevant do you think this particular result is to the main conclusion of the paper?

Purpose: We understand that different results reported in a given paper also vary in how important they are. Some results might be central to a paper while others are more peripheral. To fully understand the variability within a paper we need to consider both the credibility of specific results and how important they are.

Clarification: Think about how important this particular piece of evidence is to the central claim of the paper. If this piece of evidence was missing or turned out to be unreliable, how much would it affect your confidence in the credibility of the paper overall? This is not about how the evidence is presented or how much emphasis the authors have placed on it, but about how crucial you think this piece of evidence is in terms of the paper’s central claim, as you understand it.

Answering the relevance question:

This question is asked on a scale of 0 to 100, where 0 means this particular piece of evidence is irrelevant to the central claim, and 100 means it is crucial to the central claim and your confidence in the credibility of the paper. Like other such scales, we just ask you to try to be consistent between papers in how you apply it.

Particular results presented in a paper can vary in reliability. What is the probability that close replications of this result would find a statistically significant effect in the same direction (0-100%)?

Purpose: For one of the specific results within the paper, we will ask you to assess its replicability. This will allow us to judge whether replicability also varies within papers. It may also allow us to compare different versions of the repliCATS platform.

Clarification: Think about replicability in the same way that you thought about it in Question 4: Replicability. However, here you are considering only the specific result listed, rather than the main claim of the paper or the paper overall. Those two assessments do not have to be in line with each other. It is possible to think that a specific result might replicate, but overall, the paper’s central claim would not, or vice versa. You can imagine that this type of inconsistency is more likely when a specific result is not relevant to the central claim.

Answering the evidence-level replicability question:

Like Question 4: Replicability, we ask you to provide three separate estimates: a lower bound, upper bound and best estimate. 

  • First, consider all the possible reasons why this specific result is unlikely to successfully replicate. Use these to provide your estimate of the lowest probability of replication.
  • Second, consider the possible reasons why this specific result is likely to successfully replicate. Use these to provide an estimate of the highest probability of replication.

Third, consider the balance of evidence. Provide your best estimate of the probability that this specific result will successfully replicate.

the three-step response format

Several questions at both the paper-level and evidence level ask you to provide three separate estimates: a lower bound, upper bound and best estimate. Here’s some guidance on how to think of those estimates. In this example, we use Question 4: Replicability. You can think about your assessments of Question 5: Robustness and Question 8: Credibility in a similar manner.

Understanding the three-step format

Several questions ask you to provide three separate estimates: a lower bound, upper bound and best estimate. In this example, we outline how we want you to think of those estimates in regards to replicability. Think about your assessments of Question 5: Robustness and Question 8: Credibility in a similar manner.

  • First, consider all the possible reasons why a claim is unlikely to successfully replicate. Use these to provide your estimate of the lowest probability of replication.
  • Second, consider the possible reasons why a claim is likely to successfully replicate. Use these to provide an estimate of the highest probability of replication.
  • Third, consider the balance of evidence. Provide your best estimate of the probability that a study will successfully replicate.

Some things to consider about the three-step format:

  • Providing a lower estimate of 0 means you believe a study would never successfully replicate, not even by chance (i.e. you are certain it will not replicate). Providing an upper estimate of 100 means you believe a study would never fail to replicate, not even by chance (i.e. you are certain it will replicate).
  • Providing an estimate of 50 means that you believe the weight of evidence is such that it is as likely as not that a study will successfully replicate. If you have low prior knowledge and/or large uncertainty, please use the width of your bounds to reflect this, and still provide your best estimate of the probability of replication.
  • Any answer above 50 indicates that you believe it’s more likely that the study would replicate than it would not replicate. Answers below 50 indicate that you believe it’s more likely that the study would not replicate than it would replicate. Intervals (the range between your lowest and highest estimate) which extend above and below 50 indicate that you believe there are reasons both for and against the study replicating.

There is evidence that asking you to consider your lower and upper bounds before making your best estimate improves the accuracy of your best estimate. The difference between your upper and lower estimates is intended to reflect your uncertainty about whether that claim’s findings would replicate. There’s no correct answer here, but we expect that your intervals for those claims you feel most uncertain about will be the widest.

the process (IDEA protocol, round 1 and round 2)

The IDEA protocol was developed at the University of Melbourne, has been found to improve judgements under uncertainty. IDEA stands for “Investigate”, “Discuss”, “Estimate” and “Aggregate”, the four steps in the process of this elicitation.

As used in the repliCATS project, the IDEA protocol involves:

  • Round 1: where you independently Investigate the paper, and provide your personal judgements on the credibility of the paper, as well as your reasoning.
  • Discussion: See the judgements of the rest of your group, the aggregated judgement, and all of the comments that have been made, and having a Discussion with your group. This phase can resolve uncertainties and investigate evidence and thinking.
  • Round 2: Providing a revised Estimate (if you wish to), and describing how your thinking might have changed.

The repliCATS team will use an Aggregate of the group judgements as the final assessment for each credibility dimension evaluated.

Want more detail? The video below introduces you to the IDEA protocol, and what you’ll be doing during the workshop.

getting the most out of group discussion

Once you have submitted your Round 1 estimates, you will be returned to the platform home page, where the paper should now have the “Round 2” tag. If you click on the paper again, you will be able to view estimates and comments made by other participants in your group. If you are the first to complete your Round 1 assessment, only your results will show and you may have to check back later.

The group facilitator will organise a time for you to meet (e.g. via a video-conferencing tool) to directly discuss the paper and ask questions of each other. The Discussion phase is a key component of the IDEA protocol. It provides an opportunity to resolve differences in interpretation, and to share and examine the evidence.

In the interest of time and efficiency, the facilitator will focus the group discussion on those questions where opinions within the group diverged the most. However, the purpose of the discussion is not to reach a consensus, but to investigate the underlying reasons for these (divergent) estimates. By sharing information in this way, people can reconsider their judgements in light of any new evidence and/or diverging opinions, and the underlying reasons for those opinions.

Here are some tips and ground rules for the Discussion phase.

We encourage you to review others’ assessments and – in particular – examine any comments in response to Question 8: Credibility, because this is where assessors capture their thoughts and reasoning on the various dimensions of the paper’s credibility. You can <use the comments feature on the platform to react to other participants’ judgements, interrogate their reasoning, and ask questions.

Once again, it is important that you do not use any participant names in comments on the platform, not even your own. Comments will eventually be made public, and we need to keep these anonymous. If you want to refer to other participants, please use the anonymous user name (avatar) they have been assigned e.g. Koala11. 

You will not be able to access and review the evidence-level questions in Round 2 until you have answered the Round 2 paper-level questions. For that reason, you may wish to enter some provisional Round 2 estimates in order to proceed to and review the evidence-level questions. You can always change your Round 2 estimates during/after the group discussion, up until the paper is formally closed for assessments by the repliCATS team.

Some ground rules for the Discussion phase, regardless of whether you are leaving comments on the online platform or discussing directly:

  • Respect that the group is composed of a diversity of individuals;
  • Consider all perspectives in the group. In synchronous discussion, allow an opportunity for everyone to speak;
  • Don’t assume everyone has read the same papers or has your skills – explain your reasoning in plain language;
  • If someone questions your reasons, they are usually trying to increase their own understanding. Try to provide simple and clear justifications;
  • Try to be open-minded about new ideas and evidence.

The following list may be useful to consider when reviewing and commenting on judgements. You do not have to work through these systematically, but consider which may be relevant:

  • What did people believe the claim being made was? Was the paper clear? Did everyone understand the information and terms in the same way? If interpretations of the central claim (or any claim) vary, instead of trying to resolve this, focus on discussing what that means for the credibility of the paper.
  • Consider the range of estimates in the group and ask questions about extreme values. What would cause someone to provide a high/low estimate for this question?
  • Very wide intervals suggest unconfident responses. Are these based on uncertainties of interpretation or are these participants aware of contradictory evidence?
  • Very narrow intervals suggest very confident responses. Do those participants have extra information?
  • It’s ok if you don’t have good evidence for your beliefs – please feel free to state this.
  • If you have changed your mind since your Round 1 estimates it’s good to share this. Actively entertaining counterfactual evidence and beliefs improves your judgements.
  • If you disagree with the group that is fine. Please state your true belief when completing Round 2 estimates. This represents the true uncertainty regarding the question, and it should be captured.
  • Consider raising counterarguments, not to be a nuisance, but to make sure the group considers the full range of evidence.
  • As a group, avoid getting bogged down in details that are not crucial to answering the questions, or in trying to resolve differences in interpretation. Focus on sharing your reasons, not on convincing others of specific question answers.

round 2

You can go into Round 2 as soon as you have finished Round 1, to review what others have said about the paper and its claims. Based on that, you can start entering Round 2 judgements.

The live Discussion will provide additional opportunities to examine and clarify the reasons for the variation in judgements within your group. You can continue to update your Round 2 judgements as you and your group go through and discuss the paper. You can also come back to a paper after you’ve had some time to think about it. Claims will remain open for the duration of the workshop and – if needed – for a few extra days to allow final updates. Let the repliCATS team know if you might need some more time.

Whether or not you want to update your estimates is entirely up to you. In some instances, your views and opinions might not have shifted after discussion, but perhaps not. Either decision is absolutely fine and provides useful information. Make sure you hit the submit button to log your Round 2 estimates, even if your assessments have not changed. Remember, your Round 2 estimates are private judgements, just like your Round 1 estimates, so there is no need to worry about what others might do or think about your judgements, and whether or not you have changed your mind.

assessing single trace claims in phase 2

Single trace claims in phase 2 largely follow the phase 1 format – you will be asked to assess a research claim from a published paper that is supported by a single inferential test result that supports that claim. In the document below, you will find an updated guide to evaluating single trace claims in phase 2:

The recommended time you spend evaluating a single trace claim remains the same as in phase 1. if you are working solo or completely virtually (i.e. with no real time discussion), we suggest that you spend around 10-15 minutes to complete round one, which includes perusing the paper, and spending a bit of extra time to write down your reasoning. This will help you and the other participants for round two and when the claim closes.

training materials

The following document contains information about statistical concepts commonly used in scientific papers (e.g. p values, Cohens d, effect sizes) and information from scientific meta-research about the practice and publication of scientific findings, as well as previous replication studies:

additional videos & resources

Glossary of terms

Type 1 and Type 2 errors

Effect sizes (video – 10 mins)

This video by Daniël Lakens provides a brief introduction to the importance of effect sizes.

 

Confidence intervals (video – 10 mins)

This video by Daniël Lakens explains how to interpret confidence intervals.

 

Correlations

 

P curve analysis