Improving the credibility of empirical legal research: practical suggestions for researchers, journals, and law schools

Fields closely related to empirical legal research are enhancing their methods to improve the credibility of their findings. This includes making data, analysis code, and other materials openly available, and preregistering studies. Empirical legal research appears to be lagging behind other fields. This may be due, in part, to a lack of meta-research and guidance on empirical legal studies. The authors seek to fill that gap by evaluating some indicators of credibility in empirical legal research, including a review of guidelines at legal journals. They then provide both general recommendations for researchers, and more specific recommendations aimed at three commonly used empirical legal methods: case law analysis, surveys, and qualitative studies. They end with suggestions for policies and incentive systems that may be implemented by journals and law schools.

used to mean that a study's methodology and results are reported transparently so that they can be verified and repeated by others, that errors are caught and corrected, and that the conclusions are updated and well-calibrated to the strength of the evidence. In other words, credibility does not mean that findings are perfectly accurate or error-free, but that researchers prioritize transparency and calibration, such that errors are easy to catch and correct. A credible field is one where the findings are reported with appropriate levels of confidence and certainty and errors are likely to get corrected, such that when a finding withstands scrutiny and becomes wellestablished, it is very likely to be true. 8 Credible research presents many additional advantages: (1) it is "reproducible", meaning that its data, methods, and analysis are reported with enough detail such that other researchers can verify conclusions and correct them if needed; 9 (2) it is more efficient because reproducible workflows allow other researchers to build upon existing work and to test new questions; 10 (3) its findings are "replicable", meaning they can be confirmed through testing with new data; 11 (4) its claims are well-calibrated such that bodies that fund and rely on this research can employ them with more confidence, 12 and; (5) it inspires greater public trust. 13 Many of these benefits were recently encapsulated in a 2018 National Academy of Sciences, Engineering, and Medicine 8 Munafò et al., id. at 5: "Claims become credible by the community reviewing, critiquing, extending and reproducing the supporting evidence. However, without transparency, claims only achieve credibility based on trust in the confidence or authority of the originator. Transparency is superior to trust."; Simine Vazire & Alex O. Holcombe, Where Are The Self-Correcting Mechanisms in Science?, https://psyarxiv.com/kgqzt/. 9  overarching principle of open science by design is that research conducted openly and transparently leads to better science. Claims are more likely to be credibleor found wantingwhen they can be reviewed, critiqued, extended, and reproduced by others." 14 These advantages are even greater in fields like ELR whose impact frequently extends beyond academia to real world issues. 15 For example, ELR is regularly cited by courts in the United States. 16 It is also relied on by policy-makers, and sometimes commissioned and funded by law reform bodies. 17 In addition, empirical legal research is often publicly funded (like many fields), so the public has an interest in seeing its methods and data be publicly accessible. 18 In this article, our primary objective is to provide specific steps legal researchers and institutions can take to enhance research credibility. 19 Prior to doing that, Part II will review and contextualize existing concerns about ELR. In that part, we will provide a novel review of law journal guidelines to gauge the extent to which they promote credible practices. This analysis leads into our recommendations for concrete steps that researchers, journals, and institutions can take. In Part III, we will discuss three common empirical methodologies in ELR and how they 14  . 19 Indeed, one established barrier to data sharing is lack of knowledge of how to do it, see Laure Perrier, Erik Blondal & Heather MacDonald, The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis, 15(2) PLOS ONE (2020) at 13: "For many disciplines, data sharing was a new activity that was typically imposed by funding agencies or journals. As a result, researchers were looking for services or resources that would help with this task." For a practical guide for credible psychology research, see Olivier Klein et al., A Practical Guide for Transparency in Psychological Science, 4 COLLABRA: PSYCHOLOGY 4 (2018). may be conducted more credibly. Then, in Part IV, we turn to journals and societies, and the steps they may take to promote research credibility. Part V provides some final reflections on the path forward.

Part II. The credibility of empirical legal research
We will now contextualize ELR within the broader movement afoot in social science towards increased credibility. We will start by reviewing reforms that are becoming mainstream in social science and the concerns that inspired those reforms. We will then discuss the challenges particular to ELR, ending with an analysis of law journal publication guidelines, finding that there is considerable heterogeneity and room for improvement.

The credibility revolution in social science
Social scientific fields akin to ELR have recently taken steps to enhance their credibility. 20 Consider, for instance, the 2018 State of Social Science (3S) survey, which asked hundreds of researchers in psychology, economics, sociology and political science about whether and when they had first made public their data and materials (e.g., instruments like surveys, images or sound files presented to participants), and whether they had preregistered a study (i.e., publicly registered their hypotheses and methods before running the study, see below). 21 The survey showed considerable increases in all three self-reported behaviors over the past several years (Figure 1).
[ In psychology and economics, these reforms were inspired, in part, by the surprising failures to replicate findings published in prestigious journals. 22 These failures to replicate were difficult to ignore because of the high methodological quality of the replication studies, often pre-registered and in some cases involving sample sizes several times larger than the originals and with data collected across multiple labs. The results of these replication projects likely contributed to the opinions expressed by researchers in a survey in which 52% of respondents said there was a significant crisis in science, 38% said there was a slight crisis, and only 3% said there was no crisis. 23 The causes of this perceived crisis can be broken down into those that are preventable and those that are more difficult to control. As to the preventable (which we are most interested in), 24 self-report surveys of researchers are documenting widespread use of questionable research practices (QRPs) in many fields (e.g., psychology, economics, evolutionary biology). 25 QRPs take advantage of undisclosed flexibility in the research process to allow researchers to make their findings seem cleaner and more persuasive than they actually are (e.g., one widespread QRP is making the decision to exclude an outlier after seeing the data As we noted, several fields are increasingly adopting reforms that respond to these controllable sources of error. 31 We will now briefly review some of those reforms, which we will revisit in greater detail when we discuss how they can be leveraged by empirical legal researchers in Part III. Note that early results suggest these reforms can be effective. For instance, whereas the false discovery rate for traditional psychological studies hovers around 50%, a 2020 study found that studies using several of the below reforms (e.g., preregistration, 26

(a) Preregistration
Preregistration, also known as prospective study registration or a pre-analysis plan, became required by law in some jurisdictions for clinical medical research due, in part, to widespread concerns about the public health implications of publication bias (i.e., nonpublication of entire studies unfavorable to the drug manufacturer, partial reporting of results, and outcome switching). 33 Preregistration involves, prior to data collection, submitting the hypotheses, methods, data collection plan, and analysis plan to a common registry. 34 Preregistration makes the existence of unreported studies more findable (e.g., for metaanalyses) 35 and can discourage (or at least make detectable) QRP usage by preserving a record of the methodology as it was before the data were viewed. 36 They are associated with reduced publication bias. 37 Researchers following a preregistered analysis plan may also be less likely to mistake prediction with postdiction. In this way, preregistration can play a similar role to resultsblind analysis, which was developed in physics but is applicable to social science. 38 In other words, they may be less likely to present an exploratory finding (e.g., through mining the data for statistical significance) as one they predicted. If detailed pre-registered plans are followed, this preserves the statistical validity of analyses with respect to error control. 39 Although many fields are embracing preregistration, it should not be seen as a panacea for all threats to research credibility. Rather, it is an important tool that can be employed with other reforms aimed at shifting incentives towards getting it right (versus getting it published).
Registered reports are another excellent example of such a shift.
Registered reports ("RRs") are a new type of article where the peer review process is restructured such that reviewers evaluate the proposed methods and justification for the study before the study has been conducted and the results are known. 40 If the editor accepts the proposal, it is then guaranteed to be published if the authors follow through on that plan, which is then preregistered. Publications are, therefore, not selected based on results but the research question and method. Like preregistration, RRs can reduce publication bias and QRPs. They can also result in improved methodology as reviewers provide criticism and advice regarding methods before data is collected. Over 260 journals now accept registered reports, a number that Recent analyses find registered reports are more likely to report null results (see Figure   2). 42 This is salutary because the proportion of published literature that contains positive results is so high (~95%) that it is almost guaranteed that many of the positive results are false  submit with their manuscripts (e.g., acknowledging they have reported all produced estimates). 51 Some of these checklists have been associated with fuller reporting of, for example, a study's methodological limitations. 52 Finally, reforms to the peer review process are spreading. 53 These include: open peer review models in which peer reviews are published along with the articles; continuing peer review (in which commentaries can be appended to existing publications); and changes in peer review criteria, such as judging articles based on credibility instead of novelty. 54 As we will discuss further below, journals are also increasingly adopting guidelines that encourage or require practices like open data and preregistration. 55

Concerns about the credibility of empirical legal research
Expressed concerns about the credibility of ELR actually predate much of the discussion above, but have produced no lasting reforms or initiatives that we could find. 56 In 2001, for instance, Epstein and King reviewed many empirical legal studies and found errors of inference in all of them. In many of these, the errors stemmed from the conclusions not being based on reproducible analyses. 57 Over a decade later, little seems to have changed. 58 ELR faces many of the same challenges found in other social sciences and so perhaps it is not surprising that questions would be raised about its practices. In many cases, the relationship is direct: ELR practices often borrow from cognate disciplines, like economics and psychology, two fields whose historic practices contributed concerns about replication and reproduction. ELR also typically operates in a research environment similar many others, in which there is an incentive to publish frequently, perhaps at the cost of quality and rigor. 59 In such environments many journals also appear to favor novel and exciting findings, without a concomitant emphasis on methodology. This combination of incentives creates an ecosystem in which low credibility research is rewarded, and those who engage in more rigorous practices are driven out of the field. 60 But, ELR also faces its own set of challenges. Much research in this field is published in generalist law journals that may rarely receive empirical work. Some of these journals are edited by students, many of whom cannot be expected to have the appropriate background to evaluate empirical methods. 61  itself is a biasing force in peer review. 62 As a result, they rely more heavily on authors' status, with less emphasis on methodological quality.
Turning to the authors themselves, many empirical legal researchers possess a primarily legal background. As a result, they may not have the specialized statistical and methodological knowledge required to ensure their work is credible (which, as the replication crisis has brought to light, many people with extensive training in social science also lack). Furthermore, as trained advocates, some authors may be culturally inclined towards strong rhetoric that may, at times, not be entirely justified by the data.

Are law journals promoting research credibility?
Journals represent an important pressure point on research practices because they choose what to publish and control the form in which research is reported (e.g., by encouraging or requiring that raw data and code be published along with the typical manuscript narrative).
Accordingly, it is useful to ask: to what extent are ELR journals promoting credible practices?
As we saw above, many journals in fields outside of law have begun to reform their guidelines to encourage authors to engage in behaviors like posting their data and preregistering their hypotheses. Much of this was spurred by the development of the Transparency and Openness Promotion (TOP) Guidelines, and its goal of "promoting an open research culture". 63 The original TOP Guidelines cover 8 standards, each of which can be implemented in one of three levels of increasing rigor. A "0" indicates that the standard does not comply with TOP, for example a policy that merely encourages data sharing, or says nothing at all about data sharing. Impact Factor and Altmetrics, in that it evaluates practices that are directly associated with the scientific process. 65 Accordingly, the TOP Factor is orthogonal to factors like novelty and surprisingness of study outcomes, measures that are of interest but nonetheless overvalued. 66 The TOP Factor measures the original 8 TOP guidelines as well as whether the journal has policies to counter publication bias (e.g., by accepting registered reports) and whether it awards badges.
To determine to what extent law journals are promoting research credibility, we scored them with the TOP Factor and entered the results into the COS's larger database under "Empirical Legal Research" (see also Economics have expressly adopted the data and materials guidelines used by economics journals. 70 The American Law and Economics Review has policies for data and code, but they are not as demanding. The Journal of Legal Analysis has the strongest guidelines for data citation, having adopted the

Part III. Guidance for researchers seeking to improve their research credibility
We will now further elaborate on some of the credibility reforms we discussed in the previous part. First, we will provide some general guidance for empirical legal researchers interested in implementing these reforms. As part of that discussion, we will highlight resources that are particularly appropriate for social scientific research and widely-used guidelines that can be adapted for ELR methodologies. Then, we go on to discuss the application of these reforms to three mainstream empirical legal methodologies: case law analysis, surveys, and qualitative methods. As we will discuss, these general recommendations are all subject to qualifications, such as the ethics of sharing certain types of data.

General recommendations (a) Preregister your studies
Empirical legal researchers interested in enhancing their work's transparency can preregister their studies using platforms like the OSF, 72 the American Economic Association registry, 73 or AsPredicted. 74 These user-friendly services create a timestamped, read-only description of the project. 75 The registration can be made public either immediately or be embargoed for a pre-specified amount of time, or indefinitely. When reporting the results of preregistered work, the author should follow a few best practices. First, include a link to the preregistration so that reviewers and readers can confirm what parts of the study were prespecified. Second, the authors should report the results of all pre-specified analyses, not just those that are most significant or surprising. Third, any unregistered parts of the study should be transparently reported, ideally in a different sub-section of the results. These "exploratory" analyses should be presented as preliminary, testable hypotheses that deserve confirmation.
Finally, any changes from the pre-specified plan should be transparently reported. These changes, some of which will be trivial and some of which may substantially alter the interpretability of the results, can be better evaluated if they are clearly described. give the researcher free rein to describe the study in as little or as much detail as they like. We are not aware of any templates specifically designed for empirical legal research (although this would be a worthwhile project), but existing templates are well-suited to both experimental and observational research. Below (Part III.2), we will discuss how preregistration might operate in various ELR paradigms.

(b) Open your data and analytic code
Empirical legal researchers who wish to improve their work's reusability and verifiability, and who wish to ensure their efforts are not lost to time (as research ages, underlying data and materials become increasingly unavailable because authors are not reachable), 77 have many options open to them. Many free repositories have been developed to assist with research data storage and sharing. Given the availability of these repositories, the fact that the publishing journal does not host data itself (or make open data a requirement) is not a good reason in itself to refrain from sharing. Rather, authors can simply reference a persistent locator (such as a Digital Object Identifier, DOI) provided by a repository.
In Table 2, we display a selection of repositories that may be of particular interest to empirical legal researchers, along with some key features of those repositories. This is based, in part, on the work of Oliver Klein and colleagues, who explained the characteristics of data repositories that researchers should consider when deciding which service to use. 78 These include: whether the service provides persistent identifiers (e.g., DOIs), whether it enables citation to the data; whether it ensures long-term storage and access to the data; and, whether it Access to data alone already provides a substantial benefit to future research, but researchers can do more. 79 In this respect, researchers should strive to abide by the Global Open (GO) FAIR Guiding Principles, which were developed by an international group of academics, funders, industry representatives, and publishers (and endorsed by the NASEM in 2018). 80 These principles are that data should be Findable, Accessible, Interoperable, and Reusable (i.e., FAIR).
We have already touched on findable (e.g., via a persistent identifier) and accessible data (e.g., via a long-term repository). Interoperable data is data that can be easily combined with other data and used by other systems (e.g., through code explaining what variables mean). And, reusable data typically means data that have a license that allows reuse and "metadata" that fully explains its provenance. One helpful practice to improve interoperability and reusability is to associate data with a "codebook", or file explaining the meaning of variables. The Center for Open Science maintains a guide to interoperability and reusability relevant to social science. 81 Open and FAIR data is important to the future of ELR. For example, FAIRness allows researchers to leverage multiple datasets to perform meta-analyses and systematic reviews aimed at better understanding the robustness of a finding. It also enables researchers to combine and compare data across jurisdictions to test new hypotheses.
As to what should be shared, we recommend starting from the presumption that all raw data will be shared and then identifying any necessary restrictions and barriers. 82 Those restraints 79 NASEM, supra note 7 at 28: "Data that are open and FAIR will maximize the impact of open science." 80 Wilkinson et al., supra note 45; the NASEM Report, Id. at 53, provides a useful example of how FAIR operates: "An example of FAIR data for human use is provided by public webpages. Search engines have made many such pages findable and they are usually either immediately accessible or accessible via a paywall. Since these pages are designed for human readers, they are made (more or less) interoperable by the readers' knowledge of the language and the subject matter. Pages are often reusable by cut-and-paste document editing tools." 81  should then be expressly noted in the manuscript. 83 Preregistration (see above) can help with this process because it requires that researchers think through data collection before it occurs. For researchers collecting personal information, privacy will often be the most pertinent limitation. 84 Privacy issues can in many cases be managed through seeking consent to release data through the consent procedure and through de-identifying data, both of which should be approved the relevant institutional review board. The latter should be undertaken carefully and in accordance with applicable rules. In some cases, de-identification may not be possible to the extent necessary for ethical sharing (e.g., when risks of re-identification are high). Fortunately, repositories exist where access to raw data can be protected by qualified personnel, 85 and best practices exist for sharing data from human research participants. 86 Analytic code, such as the scripts produced by several statistical software packages, allow readers to understand how the data produced the reported findings. Statistical packages typically allow the author to annotate the code with plain language explanations. 87

(c) Open your materials
Open materials also enhance a study's credibility. The OSF allows users to create a project page that contains data, analytic code, materials, manuscripts, and a brief wiki explaining the gist of the study. Researchers may wish to share materials like interview protocols and scripts, surveys, and any image, video, or audio files that were presented to participants.
One of the clearest credibility benefits of open materials is replicability. 89 In other words, future researchers can build off existing work, using materials (e.g., surveys) in different times and contexts. For example, a researcher may wish to repeat a survey or interview that was

(d) Consult an existing or analogous checklist when possible
We are not aware of any law journals that require or encourage authors to complete checklists when submitting their work for publication. There may, however, be an existing checklist for any given methodology being used by an empirical legal researcher, developed by others using the same methodology. For instance, a group of social and behavioral scientists recently created a checklist for conducting transparent research in their field. 90 They used an iterative, consensus-based protocol (i.e., a "reactive-Delphi" process) to help ensure that the checklist reflected the views of a diverse array of researchers and stakeholders in their field.
Empirical legal researchers conducting behavioral research may find this transparency checklist 89 Nosek & Errington, supra note 11. 90 Aczel et al., supra note 51. useful when planning their research and preparing it for publication. The Equator Network also curates a database of reporting checklists relevant to various research methods and disciplines. 91 2. Three specific applications of research credibility to ELR

(a) Case law analysis
Empirical case law analysis has been widely used to address important legal issues. 92 As with other methods, credible research practices can be used to strengthen the inferences drawn from case law analysis and help ensure their enduring usefulness and impact. 93 In this subsection, we will drill down into two specific ways credible practices can be applied to empirical analyses of legal authorities: preregistering these studies, and using transparent methods to conduct them.
Preregistration poses a particular challenge when, as with analysis of legal authorities, the data are pre-existing. This is because, in preregistration's purest form, the hypothesis and methodology should be developed before the researchers have seen the data. 94 If this is not the case, the researchers may inadvertently present their hypotheses as independent of the data, when they were inadvertently constructing an explanation for what they already (in part) knew.
Another challenge is that researchers may be tempted to sample and code cases in a way that fits their narrative. For instance, researchers may unconsciously determine that cases are relevant or irrelevant for their sample based on what would produce a more publishable result. While these challenges are significant, they do not mean that preregistration is not worthwhile in case law analysis. Indeed, other useful methods like systematic reviews and meta-analyses use preexisting data, but also incorporate preregistration as part of best practices. 95 One of us has some experience preregistering case law analyses and has found it to be a challenging but useful exercise. 96 In a recent study, for instance, he and colleagues sought to determine whether a widely-celebrated Supreme Court of Canada evidence law case had, in fact, produced more exclusions of expert witnesses accused of bias than the previous doctrine had allowed for. 97 The challenge was that it would have been useful to take a look at some cases before coding them to understand how long the process would take (e.g., for staffing purposes) and how to set up the coding scheme (e.g., would judges clearly advert to different aspects of the new doctrine so that the coders could unambiguously say courts were relying on these rules?).
However, they were also aware that it would be easy to change their coding scheme and inclusion criteria based on the data to show a more startling result (e.g., shifting the timeframe slightly might make it seem as if the focal case had more or less of an impact). In other words, some type of preregistration was needed, but the standard form seemed too restrictive. Accordingly, they decided to establish the temporal scope of the search prior to looking at the cases, but to accommodate the difficulty of pre-deciding on the criteria by reading a portion of the cases prior to establishing the coding scheme. They disclosed this encroachment into the data in the preregistration, so that readers could adjust their interpretation of the results 95 PRISMA Transparent Reviews of Systematic Reviews and Meta-Analyses, Registration, http://www.prismastatement.org/Protocols/Registration (accessed 2020). 96  accordingly. 98 This was useful in that they were able to account for several issues that would have been difficult to anticipate without some prior knowledge of the cases. For instance, sometimes judges in bench trials would not exclude a witness, but rather say that the witness would be assigned no weight. It was hard to determine if this should be coded as an exclusion.
By looking at some of the data, they were able to anticipate this for the bulk of the cases. Had it been done completely ad hoc, without any preregistration, it would have been difficult to make the decision about how to code these cases in an unbiased way. They also took the step of disclosing cases that were borderline and required discussion amongst the authors, but did not do so as systematically as they would in the future. 99 In the end, they were able to give what they thought was a credible picture of the target case's effect, with the preregistration helping to reduce the possible influence of bias and helping to highlight the study's limitations.
Other transparency and openness efforts can also improve case law analysis. One method now common in systematic reviews (which also use pre-existing data) that legal researchers may leverage are "PRISMA diagrams" (see Figure 3). 100  reliance on neuroscience evidence has increased and detailing its different uses. 101 Subsequent researchers may wish to extend those findings to see whether the discovered trends and uses have changed. They may also want to stand on the shoulders of the earlier researchers and extend the analysis to other jurisdictions. In either case, they would need clear methods to follow in order to reproduce the searches, exclusion criteria, and coding. However, we have noticed considerable heterogeneity in the way methodologies were described in those studies, and in similar ones. 102 Following a well-understood framework like PRISMA to see what exactly was searched and how the search list was reduced down to what was presented in the article would be beneficial.

(b) Survey studies
Legal researchers have used surveys to address a host of questions, like lawyers' reports on their well-being, 103 judges' attitudes towards evidence procedures, 104

and the public's views
on what is a violation of privacy. 105 In fact, almost half of all quantitative studies on topics related to crime and criminal justice use surveys. 106 There are many existing resources to help improve survey methodology generally (e.g., sampling, wording of questions and prompts). 107 We, however, are interested in improving the credibility of survey research in lawconducting studies such that their findings are reproducible, errors are detected and corrected, and conclusions are calibrated to the strength of the evidence.
One key aspect of credible survey researchand one that is regularly breached in law and elsewhereis data transparency. In criminology, for instance, closed data practices protracted one incident over years, in which co-authors of a series of studies repeatedly tried and failed to obtain raw data from the lead author, as well as evidence that the reported surveys were in fact conducted. 108 In the case of surveys, data transparency pertains to reporting and making available to other researchers all key information about the questionnaire and data collection procedures. An excellent guide is the American Association of Public Opinion Research's (AAPOR) survey disclosure checklist, which recommends researchers disclose the survey sponsor, data collection mode, sampling frame, field date (or dates of administration), and exact question wording. 109 Beyond the exact question wording, which should be disclosed per AAPOR, we also recommend researchers make the entire questionnaire itself publicly available. This permits others to not only reuse the questions, but also to replicate the question ordering, which can have large effects on responses. 110 In terms of sampling, researchers should state explicitly whether sample selection was probabilistic (i.e., using random selection), in addition to describing how sampled respondents compare to the population of interest. This is important because researchers are increasingly using online non-probability samples, recruited via crowdsourcing or opt-in panels, 111 but are mislabeling these samples as "representative" when they match the general population on a handful of chosen variables (e.g., gender, race). Mischaracterizing these online convenience samples as probabilistic samples may lead readers to put more confidence in the generalizability of findings than is warranted. It also obscures the fact that non-random sampling necessitates different types of inference. That is, even when non-probability samples look similar to the population on a few chosen demographic variables, inferences from them still depend on assumptions and statistical adjustments (i.e., model-based inference), rather than probability theory (i.e., design-based inference).
Another key type of information that researchers using surveys should disclose is the response rate. Reviews of the literature have revealed widespread failure to report response rates and inconsistencies in calculating those rates. 112 Smith concluded that "disclosure standards are routinely ignored and technical standards regarding definitions and calculations have not been widely adopted in practice." 113  recruitment rate (RECR) and the panel profiling rate (PROR), yet researchers frequently misreport the study-specific completion rate (COMR) as the response rate. 114 A path forward is to require all researchers using survey data to report the response rate and to adhere to the AAPOR's Standard Definitions when calculating that rate. 115 Finally, survey researchers should transparently document and disclose their selection criteria. Beyond common eligibility criteria, such as adult status and country of residence, online platforms give researchers numerous other options, which can impact sample composition, and which are rarely reported. On Amazon Mechanical Turk, for example, researchers commonly restrict participation to workers with certain reputation scores (e.g., at least 95% approval) and/or prior Human-Intelligence-Task (HIT) experience (e.g., must have completed 500 prior HITs). 116 Such eligibility restrictions can have a profound effect on data quality and sample composition. 117 Therefore, we recommend that researchers using online platforms, like Amazon's Mechanical Turk, disclose all employed eligibility restrictions. Additionally, researchers should disclose if they exclude respondents for quality control reasons, such as for speeding through the survey, failing attention checks, or participating repeatedly (e.g., duplicate Internet Protocol addresses), and they should report how the exclusions affect findings. All such exclusions, if undisclosed and decided on after looking at effects on findings, would amount to questionable research practices and inflate the false positive rate. 114 The correct calculation is: CUMR = RECR x PROR x COMR.

(c) Qualitative research
Qualitative methods also play an important role in ELR. 118 These include methods like ethnographies, 119 interviews and focus groups, 120 and case studies. 121 While it may be tempting to think that the reforms we have discussed are inappropriate for seemingly more freeform and exploratory research, we suggest that is not the case. In saying this, we do not deny that there are fundamental differences between quantitative and qualitative methods. For instance, Lisa Webley has noted that qualitative researchers often see their work as more interpretivist than positivist, more inductive than deductive, and, at times, more interested in socially constructed facts than those that purport to have universal meaning. 122 Additionally, many qualitative researchers are skeptical of the modern open science movement, which they see as imposing quantitatively-focused evaluation criteria on qualitative researchers without understanding the contextual or epistemological differences between these types of research. 123 None of these epistemological differences, however, undermine the importance of research credibility. Rather, as we will discuss, qualitative researchers can leverage many existing reforms, even though many are grounded in positivist frameworks, to make their work easier to access and to help ensure its long-term impact.
Before delving into the reforms, we note that we are not the first to highlight the importance of credibility in qualitative legal research. 124 For instance, Allison Christians examined research methodology in a meta-analysis of case studies in international law. She found that not one article explained why the specific case was chosen: "In each case, the articles simply identify the event or phenomenon as a 'case' without further discussion." 125 This, as she goes on to note, raises the possibility of selection bias, whereby the case is not representative or probative of the claim it seeks to support. Further, Christians found that studies did not sufficiently explain why certain material was collected to document the case (and why other material was left out): "What is missing from the literature and what might make the data even more compelling, is a discussion about the authors' objectives, processes, and reasoning for collecting and using the data…" 126 And, when data and materials were relied on, Christians found that these sources were often not cited. 127 Recall that data citation is a TOP guideline that only 2 law journals in our sample addressed (and very weakly so, see Table 1).
Christians' observation about thinking through and justifying case selection reinforces the importance of preregistration of qualitative studies (when preregistration aligns with the research epistemology). 128 Indeed, preregistration is an increasingly discussed reform in qualitative research. One group recently completed a preregistration form through a consensus-based process (i.e., the same process used to create the checklist for behavioral studies above) that 124 Christians, supra note 121; Webley, supra note 118 at 935: "Also important are the extent to which she is willing to pilot her method, to make adjustments in the light of the pilot, to be reflexive and to report on the strengths and weaknesses of her research". 125 Christians, supra note 121 at 336-7. 126 Id. at 362. Similarly, she notes: "None of the international tax case studies includes a description of the author's reasoning regarding how the case is or should be constructed.": Id. at 356. 127 Id. at 359: "These authors-perhaps like many legal scholars-used their discussions with these individuals to better understand the studied subject or to construct theories about the studied subject, but they did not cite to the primary source of data-namely, notes from interviews or e-mail correspondence". 128  many researchers in the field participated in. 129 Qualitative preregistrations may include details about the research team's background as it relates to the study (as a form of reflexivity, i.e., attendance to the experiences, positions, and potential biases the researchers bring to bear on what they are studying), research questions and aims, planned procedures, sampling plans, data collection procedures, planned evidence criteria, and triangulation, auditing, and validation plans. As qualitative research tends to be more iterative than quantitative research, preregistrations may be most useful not as a means for researchers to try to establish "objectivity," but rather as a means for researchers to fully explore assumptions they may be making going into their study, and another tool for reflexivity as the study progresses.
If preregistration does not align with one's research epistemology, it is still possible to engage in transparent practices so others may evaluate research decisions and learn from researchers' practices. Researchers may be interested in maintaining open laboratory notebooks (adapted to qualitative practices) 130 and/or sharing their research materials (e.g., recruitment materials, interview and focus group protocols, fieldnotes, codebooks, etc.) on a repository like the OSF. Data may also be shared on the OSF or, for instance, the Qualitative Data Repository. 131 There are, however, important ethical considerations to account for before sharing data. Kristen Monroe outlined several concerns with the Data Access and Research Transparency (DA-RT) and Journal Editors Transparency Statement (JETS) initiatives as they relate to qualitative research. 132 These include: space constraints that may hinder full accounting for qualitative data, participant protection, the time needed to prepare qualitative data for sharing, costs of data collection, the right of first usage, and a potential chilling effect on certain research topics. Others have outlined concerns surrounding missing layers of interpretation and the importance of consent as an ongoing process. 133 Researchers should handle datasets involving information from vulnerable populations, for example sexual assault survivors or refugees, with care, such that participants' personal information is appropriately protected. Fortunately, many data repositories do offer access controls such that researchers may embargo data or provide conditions for access, if desired.
Additionally, some researchers have begun to include consent language around sharing data with other researchers on the condition the participants' anonymity is preserved or offering conditional consent, where participants can agree to participate but not share data with anyone other than the study authors. It is also important to note the same documents that make up "audit trails" (e.g. field notes, interview and focus group protocols, etc.) are useful tools for making qualitative research more open and transparent, and may be particularly beneficial for those learning how to conduct qualitative studies. 134

Part IV. Guidance for journals and institutions that wish to encourage credible research practices
Most researchers readily endorse norms associated with the reforms we have discussed above. 135 Still, in other fields, expressed acceptance of norms exceeds the actual behaviors they are associated with (e.g., researchers say sharing materials is important, but do not always live up that ideal). 136 In Part III, we attempted to address one reason behaviors may be lagging behind normsa general lack of concrete guidance aimed at legal researchers. We will now address two more factors that affect the behavior of researchers: incentives and policies. We will draw on general research on these factors, but adapt them to the distinctive ecosystem of legal research and teaching.

Journals
As we saw above, there is considerable heterogeneity in journal guidelines among law journals, both in the student-edited journals and the small number of peer-reviewed journals we considered. The most significant advancement in transparency that we found was in two journals adopting guidelines from economics journals. To us, this suggests that journal editors and boards in the ELR space may be open to adopting new guidelines, but that it is important that the task be as easy as possible, and the guidelines be tested in similar fields. For that reason, we will suggest low-cost and pre-vetted moves journals may make.
Peer-reviewed journals should consider the sample TOP implementation language curated by the COS. 137 These can be adapted to the needs of the specific legal research journal.
Similarly, journals may consult the guidelines of other ELR journals in our sample that have implemented TOP (Table 1). Registered Reportswhich have been adopted by journals in fields ranging from psychology to medicinecan be fairly easily rolled out by law journals, with recommended author and reviewer instructions available for re-use. 138  The situation with student-edited journals is more complicated because, among other reasons, their editorial boards experience a great deal of turnover, they may be less likely to have empirical backgrounds, and, as Part III indicates, current guidelines have the most room for improvement. The landscape in student-edited journals also seems to be more competitive, with editors taking into account acceptance of the article at other eminent journals. The culture of concurrently submitting to many journals places time pressure on student editors. As a result, they may be hesitant to screen articles that, despite seeming impressive in some ways, do not meet high methodological standards.
These unique hurdles at student law reviews are not insurmountable in the long run. One incremental measure may be for these journals to award badges. Recall that badges do not necessarily factor into article acceptance, but instead allow authors to signal to others that they have taken steps to improve their work's credibility and usability. 139 More generally, note that student-edited journals are not immune from change. About 15 years ago, many signed on to an agreement to accept articles with fewer words. 140 Some of the authors of the current article are beginning a project to draft sample guidelines designed for law journals that occasionally publish empirical work. Having these ready-made guidelines may make the change less daunting. We plan to circulate them to student-edited journals along with many of the justifications presented in this article. Please contact the corresponding author if you would like to contribute.
Finally, the simple step of encouraging the submission of replication studies can be an important step toward improvement and reform. Without empirical evidence about a field's replicability, it may be challenging to see the need for reform. Either individual studies or larger efforts meant to more systematically estimate the replicability of a sub-discipline can provide insight into the extent and consequences of these problems.

Law schools and faculties
Law schools and faculties can also play a role in encouraging credible practices. This naturally begins with hiring, where committees already seem to place some value on empirical research by hiring individuals who do such work. 141 It is less clear, however, whether these committees place much value on the credibility and rigor of empirical work (as opposed to factors like its surprisingness and ability to draw headlines). If committees do not take credible practices into account, then hiring practices may perpetuate irreproducible research.
Hiring committees may wish to align their search criteria and candidate evaluation with recent work laying out frameworks for basing researcher assessment on the credibility of their work. 142 The Hong Kong Principles distill research assessment into five factors. They seek to move fields from success indicators like the esteem of journals and impact factors to other criteria, like the dissemination of knowledge through open data and the analysis of existing, but poorly understood work, through evidence synthesis and meta-analysis. 143 Precedents are available to assist institutions seeking to change their hiring practices. The COS maintains a list of job listings that refer to research practices. 144  After hiring, more may be done, as some have suggested, 146 to promote collaboration between those who have specialized empirical training and experience, and those who do not. 147 One barrier to this initiative is authorship norms, and the concern that the methodological work may go unrecognized. In these circumstances, law schools may take note of a move towards a contributorship model of authorship, which recognizes the various types of work that go into a publication. 148 Internal encouragement within schools and faculties can only go so far, especially when the broader environment rewards high impact publication (in which impact is often not directly related to strength of methodology). This may especially be the case in the U.S., where law school rankings are so tied to the eminence of the journals that the faculty publishes in. Still, both in the U.S. and abroad, there are incentives to focus on methodology. 149 For example, in the US, one influential publication is considering using citation counts of individual researchers 144 Center for Open Science, Universities, https://osf.io/kgnva/wiki/Universities/ (accessed 2020). 145  (rather than Journal Impact Factors) to assess the productivity of law schools. 150 With this in mind, legal researchers may be swayed by findings that sharing data is associated with increased citations. 151 Similarly, funders are increasingly concerned with and sometimes require open practices. 152 In other words, tradeoffs between the expectations of current ranking systems and research credibility may not be as stark as they seem at first blush.

Part V. Conclusion
In producing knowledge for the legal system, empirical legal researchers have a heightened duty to present the full picture of the evidence underlying their results. We are excited for what the next several years hold for better fulfilling that duty. While there are sticking points, like the need for more training and the distinctive situation with student-edited journals, there are also an increasing number of models to follow from cognate fields, and an energized group of researchers motivated to put them into action. In the past, calls for change in ELR have sometimes gone unheeded, but never before have they been made in the context of a large, sustained movement in the rest of the research ecosystem.

Figure 1. Adoption of open and transparent practices by researchers in the social sciences
Participants were asked the year they first engaged in one of the following practices (if they had): made their data available online, made their study instruments (i.e., materials) available online, and preregistered a study. Participants were researchers in psychology, economics, political science, and sociology.

Figure 2. First listed hypothesis confirmed in standard and registered reports
The authors compared a sample of standard reports to a sample of registered reports. They found that 96.05% of standard reports found the first listed hypothesis was confirmed, whereas 43.66% of such hypotheses in registered reports were confirmed.

Figure 3. PRISMA Flow Diagram
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram is used in many fields to improve the transparency of the secondary analysis of preexisting studies. Specifically, it makes clearer why some studies were included in the analysis and some were not. As demonstrated in this figure, it can be adapted by empirical legal researchers to transparently report the cases included in the analysis and those that were excluded. In this example, the researcher searched two prominent databases, but this can be changed as needed. guidelines. TOP Factor is the sum of 10 items (https://osf.io/t2yu5/) that are awarded 0-3 based on how insistent the policy is: data citation, data transparency, analytic code transparency, materials transparency, reporting guidelines, study preregistration, analysis preregistration, replication, publication bias, and open science badges. The latter five items are not listed because all journals received 0s for them. The highest possible TOP Factor score is 30.