class: title-slide # Rmarkdown ## Towards reproducibility .center[<img src="img/00/logo_rmarkdown.png" width="100px"/>] ### Roland Krause | rworkshop | 2021-09-10
--- # Learning objectives .w-60.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt4.ml6[ .large[.gbox[You will learn to:]
- Use the .bold[markdown] syntax - Create .bold[Rmarkdown] documents - Define the output format you expect to render - Use the interactive RStudio interface to + Create your documents + Insert R code + Insert bibliography + Build your final document ] ] --- # Typical flow of data .flex[ .w-25.bg-washed-maroon.b--red.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.gbox[Source data ➡️ ]] * Experimental data * External data sets * Manually collected data and meta data ] .w-25.bg-washed-green.b--lawngreen.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.gbox[Intermediate ➡️ ]] * Derived data * Computatation * Manual curation * Tidy data .float-img.center[
] ] .w-25.bg-aqua.b--blue.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Analysis ➡️ ]] * Exploratory analysis * Statistical models * Hypothesis testing .float-img.center[
] ] .w-25.bg-washed-green.b--blue.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Manuscript ➡️ ]] * Can you reproduce your work? + All numbers + Summaries + Images .float-img.center[
] ]] -- .bg-washed-green.b--blue.ba.bw2.br3.shadow-5.ph3.mt3[ .large[.bbox[One workflow ]] * No editing of data at any step * All code needed to reproduce from one ingestions to manuscript coded and repeatable ☀ ] --- # Rmarkdown .center[] .footnote[Credit: Artwork by [Allison Horst](https://github.com/allisonhorst)] --- # Rmarkdown .flex[ .w-80.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.pb4.mt2.mb2.ml5[ .large[.gbox[Why using .bold[Rmarkdown]?] .float-img[] - Write detailed reports - Ensure reproducibility - Keep track of your analyses - Comment/describe each step of your analysis - Export a single (Rmd) document to various formats (PDF, HTML...) - Text file that can be managed by a version control system (like [git](https://git-scm.com/)) ]] ] .w-60.bg-white.b--green.ba.bw1.br3.shadow-5.ph3.mt3.ml6[ .center[ .huge[Rmarkdown]:  .huge[+]  .huge[+]  ]] --- # Markdown **Markdown** is used to **format text**. .flex[ .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt1.mr1[ .large[.gbox[Markup language]] - Such as `XML`, `HTML` - A coding system used to structure text - Uses markup tags (_e.g._ `<h1></h1>` in `HTML`) ] .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt1.mr1[ .large[.bbox[HTML]] ``` <!DOCTYPE html> <html> <body> <h1>This is a heading</h1> <p>This is some text in a paragraph.</p> </body> </html> ``` ]] -- .flex[ .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mr1.mt1[ .large[.gbox[**Lightweight** markup language]] - Easy to read and write as it uses simple tags (_e.g._ `#`) ] .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mr1.mt1[ .large[.bbox[MD example]] ``` # This is a heading This is some text in a paragraph ``` ]] --- class: hide_logo ## Common text formatting **tags** .flex[ .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.gbox[Headers]] - **6** levels are defined using `#`, `##`, `###` ... - From .Large[BIG] to .small[small] ] .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.gbox[Text style]] - **bold** (`**`This will be bold`**`) - *italic* (`*`This will be italic`*`) ]] -- .flex[ .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mr1.mt2[ .large[.gbox[Links and images]] - `http://example.com` is auto-linked - `[description](http://example.com)` - `` - `` for alternative description ] .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mr1.mt2[ .large[.gbox[Verbatim code]] - `code` (`` `coding stuff` ``) - Triple backticks are delimiting code blocks: ````` ``` This is *verbatim* code # Even headers are not interpreted ``` ````` rendered as: ``` This is *verbatim* code # Even headers are not interpreted ``` ]] --- class: hide_logo # Including
code for Rmarkdown .bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mr1[ .center[] ] .flex[ .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt1.mr1[ .bbox[Rmarkdown] - Extends markdown - Place **R code** in **chunks** - **Chunks** will be **evaluated** - Can also handle bash; python; css; ... .right[
]] .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt1.mr1[ .bbox[Knitr] - Extracts R chunks - Interprets them - Formats results as markdown - Reintegrates them into the main document (md) .right[] ] .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt1.mr1[ .bbox[Pandoc] - [pandoc](http://pandoc.org/) converts markdown to the desired document (Pdf, HTML, ...) ]] --- # Rmd creation: step 1 .center[  ] --- # Rmd creation: step 2 .center[  ] --- # Rmd creation: step 3 ### Generate your first HTML file  Use the **knit button** in RStudio --- class: hide_logo # Rmarkdown document: Structure <style type="text/css"> .span-box { border-radius: 5px; background: rgba(255, 255, 255, 0.7); padding-left: 10px; padding-right:10px; } </style> .pull-left[  ] .pull-right[ .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .rbox[YAML header] - To define document wide options - _title_, _name_, ... ] .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .bbox[Markdown] - Markdown syntax to write your descriptions, remarks - Literate programming ] .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .gbox[Chunks] - Code to be interpreted by _R_ ]] --- # R code chunks .pull-left[ ### Insert a chunk shortcut: <kbd>CTRL</kbd> + <kbd>Alt</kbd> + <kbd>I</kbd>:  ] -- .pull-right[ ### Basic Markup - Delimited by **triple backticks tags** (` ``` `) - Options in **curly brackets** + engine evaluating the code
but also python, bash, ... + ` ```{r} ` is the minimum to define a starting
chunk + name of chunk (optional) + `show` or hide the source code (`echo = TRUE)` + evaluate it or `not` (`eval = FALSE`) + figure size (inches) ...  ] --- # Navigation through chunk names #### Chunk names allow you to quickly navigate code, automatically name figures, and troubleshoot errors. - Chunk names must be `unique`. By default a numbered chunk name will be assigned .center[  ] --- ## Inline R code .flex[ .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Integrate small pieces of _R_ code]] Use backticks (`` ` ``) followed by the keyword `r`: `` `r <your R code>` `` ] .w-70.bg-washed-blue.b--blue.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Example]] Type in ``1 + 1 = `r 1 + 1` `` to render .bold.blue[1 + 1 = 2]. ]] --- ## Popular output formats .flex[ .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.gbox[HTML]] - Fast rendering - No need for extra install - By default embeds binaries (pictures, libraries etc.) ] .w-70.bg-washed-yellow.b--red.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.rbox[PDF]] - Single file - Requires `\(\LaTeX\)`, have a look at the **[TinyTeX](https://yihui.name/tinytex/)** package for minimal install ] .w-70.bg-washed-blue.b--purple.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Word]] - Widely used - Easily editable - **Collaborate** with people not using Rmarkdown - Prepare **scientific manuscripts** suitable for submission ]] --- # Styling your tables .pull-left[ If you prefer that data be displayed with additional formatting you can use the `knitr::kable` function. #### Add a table caption #### ```r sw_db <- as_tibble(swiss, rownames = "Province") %>% select(Province:Catholic, -starts_with("E")) %>% slice_head(n = 5) knitr::kable(sw_db ,caption = "Swiss Fertility and Socioeconomic Indicators (1888)") ``` Table: Swiss Fertility and Socioeconomic Indicators (1888) |Province | Fertility| Agriculture| Catholic| |:------------|---------:|-----------:|--------:| |Courtelary | 80.2| 17.0| 9.96| |Delemont | 83.1| 45.1| 84.84| |Franches-Mnt | 92.5| 39.7| 93.40| |Moutier | 85.8| 36.5| 33.77| |Neuveville | 76.9| 43.5| 5.16| ] .pull-right[ #### Specify column alignment Change the alignment of the table columns with a vector of characters `l` (left), `c` (center), and `r` (right) or a single multi-character string for alignment ```r knitr::kable(sw_db , align = "lccc", caption = "Swiss Fertility and Socioeconomic Indicators (1888)") ``` Table: Swiss Fertility and Socioeconomic Indicators (1888) |Province | Fertility | Agriculture | Catholic | |:------------|:---------:|:-----------:|:--------:| |Courtelary | 80.2 | 17.0 | 9.96 | |Delemont | 83.1 | 45.1 | 84.84 | |Franches-Mnt | 92.5 | 39.7 | 93.40 | |Moutier | 85.8 | 36.5 | 33.77 | |Neuveville | 76.9 | 43.5 | 5.16 | ] --- # Tables .pull-left[ ### Format numeric columns Set the maximum number of decimal places via the `digits`, which will be passed to the `round()` function ```r knitr::kable(sw_db , digits = 1) ``` |Province | Fertility| Agriculture| Catholic| |:------------|---------:|-----------:|--------:| |Courtelary | 80.2| 17.0| 10.0| |Delemont | 83.1| 45.1| 84.8| |Franches-Mnt | 92.5| 39.7| 93.4| |Moutier | 85.8| 36.5| 33.8| |Neuveville | 76.9| 43.5| 5.2| ] .pull-right[ ### Beautiful tables with R Several packages are being developed currently, rapidly moving field. * [huxtable](https://hughjonesd.github.io/huxtable/huxtable.html) * [flextable](https://davidgohel.github.io/flextable/articles/overview.html) * [xtable](https://cran.r-project.org/web/packages/xtable) * [stargazer](https://cran.r-project.org/web/packages/stargazer/vignettes) * [pander](https://www.r-project.org/nosvn/pandoc/pander.html) * [tables](https://cran.r-project.org/web/packages/tables/index.html) * [gt](https://gt.rstudio.com/) * etc. ] --- # Scientific publishing with Markdown .pull-left[ ### Equations with MathJax * Enclose in `$` for in line equations, e.g. ` `\(a^2+b^2=c^2\)`` renders as `\(a^2+b^2=c^2\)`. * Double (`$$`) for separate equations. ``` $$G_{\mu v}=8 \pi G (T_{\mu v} + \rho _\Lambda \ g_{\mu v}) $$ ``` yields $$G_{\mu v}=8 \pi G (T_{\mu v} + \rho _\Lambda \ g_{\mu v}) $$ ] .pull-right[ ### How does it work * No need for direct interaction with `\(\LaTeX\)`, `pandoc` is taking care. * Numbering is requiring bookdown pages and PDF output. * Complex `\(\LaTeX\)` arrangements can be used as alternatives for builtin tables. ``` $$\begin{array}{ccc} x_{11} & x_{12} & x_{13}\\ x_{21} & x_{22} & x_{23} \end{array}$$ ``` `$$\begin{array}{ccc} x_{11} & x_{12} & x_{13}\\ x_{21} & x_{22} & x_{23} \end{array}$$` ] --- # Lessons learned .flex[ .w-70.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.gbox[Experience]] Two manuscripts published, computed rendered using Rmarkdown - Initially, we kept text and analysis code together - Hard to organize - abstract already contains conclusions - Eventually, all code was one big chunk in the Rmarkdown doc ] .w-70.bg-washed-yellow.b--red.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.rbox[Our `standard` setup]] Code follows the data life cycle, e.g. using scripts to 1. `Import` 2. `Transform` 2. `Model` Controlled by another script (e.g. using Make, or runner script) ] .w-70.bg-washed-blue.b--purple.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Proposed organization]] * Explore your dataset in the context of an Rmd document * Move production code to a script, e.g. in a directory called `R` Better: use the`targets`that extends the concept of reproducible workflows. ]] --- --- # Bibliography .flex[ .w-50.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Supported formats]] - Use it with your EndNote or Zotero database: - **BibLaTeX**, **BibTeX**, **EndNote**, **EndNote XML**, MEDLINE, ISI, MODS, RIS, Copac, JSON citeproc ] .w-50.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[Styles]] - uses citation style language (`csl`) files - have a look at: + <https://www.zotero.org/styles> + <https://github.com/citation-style-language/styles> ]] --- # Bibliography .flex[ .w-50.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[How to ]] - setup in the yaml header - insert citations using the [pandoc](https://pandoc.org/MANUAL.html#citations) syntax:\ `[@citation-key]` ``` --- title: "Sample Document" output: html_document bibliography: bibliography.bib csl: nature.csl --- Insert your reference [@my-reference] like I did. ``` ] .w-50.bg-washed-yellow.b--gold.ba.bw2.br3.shadow-5.ph3.mt3.mr2[ .large[.bbox[Tips for Zotero 💡]] - install the [Better Bib(La)TeX](https://github.com/retorquere/zotero-better-bibtex) plugin - **adjust the preferences** (for better integration) - export your database as **bibtex** - **drag and drop** pandoc keys to your Rmarkdown document ]] --- # Before we stop .flex[ .w-50.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt2.ml2[ .large[.gbox[You learned to:] - What is `Rmarkdown` (`Rmd`) - Basic syntax of `Markdown` - `knit` your `Rmd` to different output formats - Styling tables - Bibliography integration ]] .w-50.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt2.ml2[ .large[.bbox[Acknowledgments 🙏 👏] * Eric Koncina (initial content) * Veronica Codoni (major overhaul) * Xie Yihuie * Alison Hill * Hadley Wickham * Artwork by [Allison Horst](https://twitter.com/allison_horst) * Jenny Bryan] ]] .flex[ .w-50.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt1.ml1[ .large[.ybox[Further reading 📚] - [Rmarkdown, the definitive book](https://bookdown.org/yihui/rmarkdown/) - [Rmarkdown website](https://rmarkdown.rstudio.com/)] ] .w-50.pv2.ph3.mt1.ml1[ .huge[.bbox[Thank you for your attention!]] ] ]