Open & Reproducible

Peter Alping

March 23, 2017

Choosing the right tools

Scientific software tools should

  1. Be freely available to anyone
  2. Contain no artificial limitations
  3. Store data in open and future-proof formats
  4. Allow for easy and productive collaboration
  5. Allow for version control
  6. Allow for scrutiny and peer-review

Open vs. Proprietary

Open Software Proprietary Software
Free of charge Must pay for licence
Source code available Source code not available
Able to modify the software Unable to modify the software
Always available in its current form Might become unavailable
No artificial limitations Might contain artificial limitations
No clear responsible party Support from vendor

You want to “own” your tools!

Version control

Re:Re:Re:Fw:Re: very_important_report_v39.docx

  • Dropbox[1]
  • Google Docs[2]


Use dedicated version-control software

Git[3], Subversion[4], Mercurial[5]

  • Keep track of every version of every file, without manual renaming
  • Facilitate cooperation between colleagues
  • Supports all file formats
  • Cloud storage (Github[6], BitBucket[7])


The writing tool should to be able to

  1. Handle tables, figures, cross references, citations
  2. Easily make sweeping changes to the formatting
  3. Let the author focus on the content

The word processor (e.g. MS Word[8])


Use a markup language instead of a word processor

Markdown[9,10], LaTeX[11]

  • Store as plain text (a future-proof and open format)
  • Separation of content from formatting
  • Easy to change the look of the text
  • Different types of output from a single source file


# Title
A very interesting paragraph.

## Subtitle

List of items:
- Item 1 is regular
- *Item 2 is italic*
- **Item 3 is bold**

Citations are easy as well: [@Krewinkel2016]


The PowerPoint presentation


Use an open web-based format for presentations

reveal.js[12], shower[13], impress[14], deck.js[15]

…with GIFs! [16]


Proprietary software might

  • Not allow for reproducibility
  • Have artificial limitations
  • Not allow for review of source code
  • SAS[17], STATA[18], SPSS[19], MS Excel[20]


Use open statistical software

R[21], Python[22]

  • Write code (don’t click buttons)
  • Store data in open formats (CSV[23])
  • Encrypt all sensitive data (VeraCrypt[24])
  • Edit data non-destructively
  • Publish the code and (raw) data




Use the open bibliography software


  • It’s just better
  • …and it’s free!
  • Using a markup language -> look into BibTeX[27]


Common to use MS Word[8] or MS PowerPoint[28]

…this is not ideal…


Use open software designed for graphics

  • Save graphics in an open format
  • Prefer vector graphics over bitmap graphics (in most instances)
  • Vector graphics: Inkscape[29]
  • Bitmap graphics: GIMP[30]
  • 3D graphics: Blender[31]

Final notes


[1] Dropbox. Dropbox 2017.

[2] Google. Google docs 2017.

[3] Git. Git 2017.

[4] Apache. Apache subversion 2017.

[5] Mercurial. Mercurial 2017.

[6] GitHub. GitHub 2017.

[7] Atlassian. Bitbucket 2017.

[8] Microsoft. Microsoft word 2017.

[9] Wikipedia. Markdown 2017.

[10] Krewinkel A, Winkler R. Formatting open science: Agile creation of multiple document types by writing academic manuscripts in pandoc markdown. PeerJ Preprints 2016.

[11] LaTeX. LaTeX 2017.

[12] hakimel. Reveal.js 2017.

[13] shower. Shower 2017.

[14] impress. Impress.js 2017.

[15] imakewebthings. Deck.js 2017.

[16] Giphy. Chris Pratt - Surprised GIF 2017.

[17] SASInstitute. SAS 2017.

[18] StataCorp. STATA 2017.

[20] Microsoft. Excel 2017.

[21] R. R: The r project for statistical computing 2017.

[22] Python. Python 2017.

[23] Wikipedia. Comma-separated values 2017.

[24] VeraCrypt. VeraCrypt 2017.

[25] ClarivateAnalytics. EndNote 2017.

[26] Zotero. Zotero 2017.

[27] Wikipedia. BibTeX 2017.

[28] Microsoft. Microsoft powerpoint 2017.

[29] Inkscape. Inkscape 2017.

[30] GIMP. GNU image manipulation program 2017.

[31] Blender. Blender 2017.