Open & Reproducible
Workflows

Peter Alping

March 23, 2017

Choosing the right tools

Scientific software tools should

  1. Be freely available to anyone
  2. Contain no artificial limitations
  3. Store data in open and future-proof formats
  4. Allow for easy and productive collaboration
  5. Allow for version control
  6. Allow for scrutiny and peer-review

Open vs. Proprietary

Open Software Proprietary Software
Free of charge Must pay for licence
Source code available Source code not available
Able to modify the software Unable to modify the software
Always available in its current form Might become unavailable
No artificial limitations Might contain artificial limitations
No clear responsible party Support from vendor

You want to “own” your tools!

Version control

Re:Re:Re:Fw:Re: very_important_report_v39.docx

  • Dropbox[1]
  • Google Docs[2]

Recommendation

Use dedicated version-control software

Git[3], Subversion[4], Mercurial[5]

  • Keep track of every version of every file, without manual renaming
  • Facilitate cooperation between colleagues
  • Supports all file formats
  • Cloud storage (Github[6], BitBucket[7])

Writing

The writing tool should to be able to

  1. Handle tables, figures, cross references, citations
  2. Easily make sweeping changes to the formatting
  3. Let the author focus on the content

The word processor (e.g. MS Word[8])

Recommendation

Use a markup language instead of a word processor

Markdown[9,10], LaTeX[11]

  • Store as plain text (a future-proof and open format)
  • Separation of content from formatting
  • Easy to change the look of the text
  • Different types of output from a single source file

Markdown[9,10]

# Title
A very interesting paragraph.

## Subtitle

List of items:
- Item 1 is regular
- *Item 2 is italic*
- **Item 3 is bold**

Citations are easy as well: [@Krewinkel2016]

Presentation

The PowerPoint presentation

Recommendation

Use an open web-based format for presentations

reveal.js[12], shower[13], impress[14], deck.js[15]

…with GIFs! [16]

Statistics

Proprietary software might

  • Not allow for reproducibility
  • Have artificial limitations
  • Not allow for review of source code
  • SAS[17], STATA[18], SPSS[19], MS Excel[20]

Recommendation

Use open statistical software

R[21], Python[22]

  • Write code (don’t click buttons)
  • Store data in open formats (CSV[23])
  • Encrypt all sensitive data (VeraCrypt[24])
  • Edit data non-destructively
  • Publish the code and (raw) data

Bibliography

EndNote[25]

Recommendation

Use the open bibliography software

Zotero[26]

  • It’s just better
  • …and it’s free!
  • Using a markup language -> look into BibTeX[27]

Graphics

Common to use MS Word[8] or MS PowerPoint[28]

…this is not ideal…

Recommendation

Use open software designed for graphics

  • Save graphics in an open format
  • Prefer vector graphics over bitmap graphics (in most instances)
  • Vector graphics: Inkscape[29]
  • Bitmap graphics: GIMP[30]
  • 3D graphics: Blender[31]

Final notes

References

[1] Dropbox. Dropbox 2017. https://www.dropbox.com/.

[2] Google. Google docs 2017. https://www.google.com/docs/about/.

[3] Git. Git 2017. https://git-scm.com/.

[4] Apache. Apache subversion 2017. https://subversion.apache.org/.

[5] Mercurial. Mercurial 2017. https://www.mercurial-scm.org/.

[6] GitHub. GitHub 2017. https://github.com/.

[7] Atlassian. Bitbucket 2017. https://bitbucket.org/.

[8] Microsoft. Microsoft word 2017. https://products.office.com/en/word.

[9] Wikipedia. Markdown 2017. https://en.wikipedia.org/wiki/Markdown.

[10] Krewinkel A, Winkler R. Formatting open science: Agile creation of multiple document types by writing academic manuscripts in pandoc markdown. PeerJ Preprints 2016. https://doi.org/10.7287/peerj.preprints.2648v2.

[11] LaTeX. LaTeX 2017. https://www.latex-project.org/.

[12] hakimel. Reveal.js 2017. https://github.com/hakimel/reveal.js/.

[13] shower. Shower 2017. https://github.com/shower/shower.

[14] impress. Impress.js 2017. https://github.com/impress/impress.js.

[15] imakewebthings. Deck.js 2017. https://github.com/imakewebthings/deck.js.

[16] Giphy. Chris Pratt - Surprised GIF 2017.

[17] SASInstitute. SAS 2017. https://www.sas.com/.

[18] StataCorp. STATA 2017. http://www.stata.com/.

[20] Microsoft. Excel 2017. https://products.office.com/en/excel.

[21] R. R: The r project for statistical computing 2017. https://www.r-project.org/.

[22] Python. Python 2017. https://www.python.org/.

[23] Wikipedia. Comma-separated values 2017. https://en.wikipedia.org/wiki/Comma-separated_values.

[24] VeraCrypt. VeraCrypt 2017. https://veracrypt.codeplex.com/.

[25] ClarivateAnalytics. EndNote 2017. http://endnote.com/.

[26] Zotero. Zotero 2017. https://www.zotero.org/.

[27] Wikipedia. BibTeX 2017. https://en.wikipedia.org/wiki/BibTeX.

[28] Microsoft. Microsoft powerpoint 2017. https://products.office.com/en/powerpoint.

[29] Inkscape. Inkscape 2017. https://inkscape.org/.

[30] GIMP. GNU image manipulation program 2017. https://www.gimp.org/.

[31] Blender. Blender 2017. https://www.blender.org/.