Reproducible Manuscripts

Traditional manuscripts include text and result figures and tables that are produced by analysis code and data that are separate from the manuscript itself. The goal of “reproducible manuscripts”, as the name implies, is to publish in a format that allows for ease in re-running the original analyses accurately. This allows one to more conclusively identify the provenance of any published results and test their reproducibility, as well as apply the same analyses to new data.

More specifically: to accomplish this goal, we want to share a file or set of files that can allow anyone to reproduce the entire manuscript (including text, figures, and numerical results) using the same data and code as the original authors. Thus, a reproducible manuscript inherently implements aspects of a reproducible analysis, though full reproducibility also requires the use of a reproducible analysis platform, as outlined in the Reproducible Analysis section.

Prerequisites

  • Markdown

    • The most popular systems for producing reproducible manuscripts require familiarity with the Markdown system for text formatting.  Markdown is also used extensively on Github (e.g. for the formatting of README files and comments), so learning it will pay many dividends.  Fortunately, it’s very simple to learn.

Getting started

  1. Generate your analysis code using an appropriate format for a reproducible manuscript. The manuscript should incorporate all code necessary to generate the figures, tables, or numerical results into a single file, using the best practices outlined in the section on Reproducible Analysis.

    • The most commonly used tool for reproducible manuscripts is RMarkdown, generally used within the RStudio environment.

      • RMarkdown can be installed from within the R environment: install.packages(‘RMarkdown’)

      • It is possible to include Python code chunks within an RMarkdown file, using the reticulate library.  However, this is unlikely to be a satisfying development experience for the Python programmer, since RStudio does not provide the usual tools that one would expect of a Python development environment.

    • Jupyter notebooks can also be used to generate a document that includes a mixture of code and markdown, which can then be exported to a format suitable for publication.

      • Jupyter is missing some useful features of RMarkdown, such as the ability to insert variables within the markdown and have their value inserted in the text.

  2. The code should be organized into clearly labeled sections, such that each section generates a specific table, figure, or numerical result.  Rather than incorporating the entirety of the code within your manuscript document, you can define functions in separate files that can be imported into the manuscript, as long as they are shared alongside the manuscript.

  3. In some cases, the entire analysis workflow cannot be included in the paper due to its complexity (for example, when it involves execution on a high-performance computing system). In these cases, it is common to rely upon intermediate data files derived from earlier steps in the analysis. The provenance of these intermediate files must be made clear in the manuscript and those earlier analyses should be provided in a way that can be reproduced separately.

Advanced topic: Containerizing and automating your reproducible manuscript

Once you have generated a reproducible manuscript, a next step is to containerize the manuscript and automate its execution.

TBD

Frequently Asked Questions

What is “literate programming” and how does it relate to reproducible manuscripts?

The idea of a reproducible manuscript derives from the concept of literate programming that was introduced by Donald Knuth (paper). In general, literate programming is a method for writing programming logic in a human language with code snippets and macros. The document follows the flow of human logic (step-wise process moving through analyses) incorporating code for each step. Literate programs are essentially explanations of code (with code incorporated) to be read by humans.

How can I include citations in my reproducible manuscript?

RMarkdown:

Jupyter notebooks:

Resources

RMarkdown:

Jupyter:

Other tools:

  • Stenci.la - an emerging platform for writing reproducible manuscripts, but not yet publicly available

  • Noweb - a language-independent tool for literate programming

  • Papaja - an R package for producing manuscripts in APA format