class: title-slide
# Reproducible pipelines ## using `targets` and `renv` .center[<img src="https://docs.ropensci.org/targets/reference/figures/logo.png" width="80px"/> <img src="https://raw.githubusercontent.com/rstudio/renv/master/man/figures/logo.svg" width="100px"/>] ### A. Ginolhac | rworkshop | 2021-09-10 --- class: middle, center, inverse # targets
--- # targets and companion package tarchetypes
.flex.items-center[ .w-10[ ] .w-70.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt3.mr1[ .large[.bbox[A workflow manager for R]] .large[ - Saving you time and stress - Understand how it is implemented in `targets` + Define your `targets` + Connect `targets` to create the **dependencies**
+ Check **dependencies** with `visnetwork` + Embrace **dynamic** branching
+ Run **only** what needs to be executed + Bundle **dependencies** in a Rmarkdown document with [`tar_render()`](https://wlandau.github.io/tarchetypes/reference/tar_render.html) + Increase reproducibility with the package manager [`renv`](https://rstudio.github.io/renv/articles/renv.html) - Example with RNA-seq data from .bold[Wendkouni Nadège MINOUNGOU] ]]] --- # Folder structure
.left-column[ ```r ├── .git/ ├── run.R *├── _targets.R ├── _targets/ ├── Repro.Rproj ├── R *│ ├── functions.R *│ └── utils.R ├── run.R* ├── renv/ ├── renv.lock └── report.Rmd ``` ] .right-column[ .large[ - With [`renv`](https://rstudio.github.io/renv/). Snapshot your package environment (and restore! 😌) - `_targets.R` is the only mandatory file - Use a `R` sub-folder for functions, gets closer to a
package - `Rmarkdown` file allows to gather results in a report - In a RStudio project - Version tracked with
] ] --- #Example with RNA-seq data across 3 cell lines ### PCA shows that differences between cells .bold.red[`>>`] biological effect (roman numbers)
-- ### Solution: Split counts and metadata for each cell .Large[Do we copy code 3 times?] --- # Define targets = explicit dependencies
.pull-left[ ### `_targets.R`, define 4 targets Last `target` depends on the **3** upstreams ```r library(targets) source("R/functions.R") source("R/plotting.R") list( * tar_target(cells, c("HepG2", "HuH7", "Hep3B")), * tar_qs(dds, read_rds(here::here("data", "all.rds")), packages = "DESeq2"), * tar_fst_tbl(annotation, gtf_to_tbl(here::here("data", "gencode.v36.annotation.gtf")), packages = c("tibble", "rtracklayer")), * tar_qs(sub_dds, subset_dds(dds, filter(annotation, type == "gene"), .cell = cells), pattern = map(cells), # dynamic branching packages = c("DESeq2", "tidyverse")) [...] ) ``` ] -- .pull-right[ .bold[Dynamic branching] makes dependencies easier to read. .large[ > .orange.bold[Of course, someone has to write for loops, it doesn't have to be you] .tr[ — _.bold[Jenny Bryan]_]]  ] .footnote[Figure from `tar_visnetwork()`] --- # Running targets
.pull-left[ ``` ● run target annotation ● run target cells ● run target dds ● run branch sub_dds_3078b1e0 condition time_h HepG2_I1 control 0 HepG2_I2 HIL6 2 using pre-existing size factors estimating dispersions gene-wise dispersion estimates: 2 workers mean-dispersion relationship final dispersion estimates, fitting model and testing: 2 workers ● run branch sub_dds_d05c5da7 condition time_h HuH7_I1 control 0 HuH7_I2 HIL6 2 using pre-existing size factors estimating dispersions gene-wise dispersion estimates: 2 workers mean-dispersion relationship final dispersion estimates, fitting model and testing: 2 workers ● run branch sub_dds_c60d7096 condition time_h Hep3B_I1 control 0 Hep3B_I2 HIL6 2 using pre-existing size factors estimating dispersions gene-wise dispersion estimates: 2 workers mean-dispersion relationship final dispersion estimates, fitting model and testing: 2 workers ● end pipeline ``` ] -- .pull-right[ Options to display time and object sizes  ] --- # Re-running
.pull-left[ ``` ✓ skip target annotation ✓ skip target cells ✓ skip target dds ✓ skip branch sub_dds_3078b1e0 ✓ skip branch sub_dds_d05c5da7 ✓ skip branch sub_dds_c60d7096 ✓ skip pipeline ``` .Large[All good, nothing to be done ✔️. Actually `targets` tracks all objects and so functions A more complete dependency graph shows .bold[functions] ] ] -- .pull-right[  .Large[Let's add the PCA per cell type now] ] --- class: hide_logo # PCA, add 4 targets
.pull-left[ ### Smaller targets avoid unnecessary re-running steps ```r [...] tar_target(rcounts, vst(sub_dds, blind = TRUE), pattern = map(sub_dds), packages = c("DESeq2")), tar_target(pca_df, pca_meta(rcounts), pattern = map(rcounts), packages = c("DESeq2", "tidyr", "dplyr")), tar_target(pca_cell, tibble(cell = cells, pca = list(plot_pca_meta(pca_df))), pattern = map(cells, pca_df), packages = c("ggplot2", "tibble")) [...] ``` .large[ .bold[Translate into]: - For every cell data, compute regularized counts (`vst`: variance stabilization) - For every regularized counts, compute PCA (`df`: data.frame, _i. e_ a table) - For every cell names / PCA tables, plot PCA in a table for easier labeling ] ] -- .pull-right[  ] --- # PCA results .flex[ .w-33.b--green.ba.bw2.br3.shadow-5.ph4.mt2.mr2[ ### Running <video width="310" height="340"> <source src="https://biostat2.uni.lu/practicals/data/targets_pca.mp4" type="video/mp4"> </video> ] .w-33.b--green.ba.bw2.br3.shadow-5.ph4.mt2.mr2[ ### Awesome feature: load results IN a Rmarkdown document .green.bold[Separate] `code` from content  ] .w-33.b--green.ba.bw2.br3.shadow-5.ph4.mt2[ ### How to display a plot  ]] --- # The full picture #### Adding step by step desired analyses #### Whole analysis takes 24 minutes and 4.54 seconds .large[ > .orange.bold[Of course, someone has to</br>remember the dependencies, </br>it doesn't have to be you] .tl[ — _could be William Landau via .bold[Jenny Bryan]_]]
--- # Is it worth the effort? -- .left-column[ .huge.bold.green[Yes] .Large[ #### For you - independence - autonomy - skills - _free_ time - confidence over results - reproducibility - fun 🥳 ] ] -- .right-column[ ### Reproducibility .bold[`targets`] via `git` ```r > renv::history() commit author_date committer_date subject 1e8dd2278 2021-02-23 15:29:57 2021-02-23 15:29:57 reformat creating config files 24c1222db 2021-02-15 17:07:01 2021-02-15 17:07:01 highlight gene type in the DEG patterns 326c8a726 2021-02-04 16:16:38 2021-02-04 16:16:38 cluster LRT genes by they dynamic patterns 4c6791796 2021-01-26 13:08:15 2021-01-26 13:08:15 gene types in upset plots for lengths 5865ee70b 2021-01-21 16:36:48 2021-01-21 16:37:08 add upset plots 6c06b496e 2021-01-20 17:03:29 2021-01-20 17:03:29 add upset protoype [...] ``` ] --- # `renv` features
.pull-left[ - `renv` parses your code and finds library calls - `install` a package including its dependencies - `snapshot` registers changes, hashes and origin ```r > renv::snapshot() The following package(s) will be updated in the lockfile: # CRAN =============================== - RcppParallel [5.0.2 -> 5.0.3] - cli [2.3.0 -> 2.3.1] - pkgload [1.1.0 -> 1.2.0] - tint [0.1.3 -> *] # GitHub ============================= - targets [ropensci/targets@main: 598d7a23 -> bdc1b29c] Do you want to proceed? [y/N]: ``` `restore` to a certain point in time ] -- .pull-right[ `renv.lock` file after a `snapshot` ``` "R": { "Version": "4.0.3", "Repositories": [ { "Name": "CRAN", "URL": "https://cloud.r-project.org" } ] }, "Bioconductor": { "Version": "3.12" }, "Packages": { "AnnotationDbi": { "Package": "AnnotationDbi", "Version": "1.52.0", "Source": "Bioconductor", "Hash": "ca5106b296b3aa6af713ce197be547c1" }, "BH": { "Package": "BH", "Version": "1.75.0-0", "Source": "Repository", "Repository": "CRAN", "Hash": "e4c04affc2cac20c8fec18385cd14691" }, "targets": { "Package": "targets", "Version": "0.1.0.9000", "Source": "GitHub", "RemoteType": "github", "RemoteUsername": "ropensci", "RemoteRepo": "targets", "RemoteRef": "main", "RemoteSha": "598d7a23661d4c760209c7991bf10584eadcf7c8", "RemoteHost": "api.github.com", "Hash": "ee66061fd5c757ec600071965d457818" }, [...] ``` ] --- # Reports as Rmarkdown documents
.left-column[ `targets`, written by [William Landau](https://wlandau.github.io/) (pictured), is flexible, robust and still allows for a customized report. All computing is done only when needed, and code is away from writing content. Once `knitted` the report can be sent to the inquirer. ] -- ### Targets Markdown New in `targets` > **0.6**. Instructions at [William bookdown](https://books.ropensci.org/targets/markdown.html) Test it as the Rmd template (and excellent [video](https://www.youtube.com/watch?v=FODSavXGjYg) from R Lille meetup by **Landau**): .center[] --- # Bonus: watch the pipeline running live 🍿
.left-column[ - `targets` events watched live 🎞 - Here, after changing a threshold in the LRT step - `branches` can be monitored too - 2 videos joined as I fixed an .red.bold[error] at 1'42" - Option to display functions (unset here) ] .right-column[ ### `tar_watch()` shiny app from `targets` <video> <source src="https://biostat2.uni.lu/practicals/data/tar_watch.mp4" type="video/mp4"> </video> ] --- # Before we stop (for good) .flex[ .w-50.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt2.mr1[ .large[.gbox[Highlights] * `targets`, dependencies manager, re-run what's needed * `renv`, bundled your packages and versions inside a project ]] .w-50.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt2.ml1[ .large[.bbox[Acknowledgments 🙏 👏] * **Eric Koncina** early adopter of `targets` * **Wendkouni N. Minoungou** for the RNA-seq data * [**William Landau**](https://github.com/wlandau) main developer of `targets` ] ]] .flex[ .w-50.bg-washed-green.b--green.ba.bw2.br3.shadow-5.ph3.mt2.mr1[ .large[.ybox[Further reading
]] - [Main website](https://docs.ropensci.org/targets/) - [Targetopia](https://wlandau.github.io/targetopia/) **Landau** universe of targets-derived - [Video](https://www.youtube.com/watch?v=FODSavXGjYg) from R Lille meetup by **William Landau**. June 2021 45'' - [Documentation](https://books.ropensci.org/targets/) as bookdown by **Landau** ] .w-50.pv2.ph3.mt2.ml1[ .huge[.bbox[Thank you for your attention! Hope it was useful!]] ]]