setwd("C:/Users/veronica.codoni/Projects/Survival_Analysis")
Development tools, version control, packages
Rworkshop
Tuesday, 6 February 2024
We want to work in a minimally distraction environment.
We want to write as little code as possible and use ideas that others have implemented.
How can we share code when it’s constantly changing?
Integrated development environment
Project environment
git
git
renv
for environment managementRStudio is an Integrated Development Environment. It makes working with R much easier
Install R first
RStudio and R are independent.
Features
Scripting
Console
Rmarkdown
output logsTerminal
tabJobs
tabEnvironment
str()
git
integrationlearnr
Files / Plots / Help
rm(list = ls())
is not recommended
library()
calls remain==
<- !=
Rmarkdown
files solves it
targets
allow to be fast and fully reproducibleWarning
Please save all environments except for the R session
Consider at your own risk
Creates many warnings if all options are selected, particular with tidyverse code using non-standard evaluation.
.Rproj
extension..Rproj
file in a already existing folderNew project
.Rproj
extension file is generated.Good practice
Use git
and renv
when starting projects from scratch.
Avoid using setwd()
and absolute file paths
What’s wrong with setwd()? Jenny Bryan’s great blog post on project based workflows
This approach is not self-contained and portable!!
RStudio projects as alternative
Start a new research project/data analysis by creating a new Rstudio Project
Create a Project in Rstudio assures the project directory to be stand-alone and portable.
Always name the projects properly
Keeping names concise, no spaces
Once the project has been created, a .Rproj
extension file is generated. This allows for automatic working directory set-up.
Exercise
dummy-project
git
renv
data
and R
.R/
project for code, similar to src
and bin
directories.Tip
We will not use this directory in this workshop any more, hence the name.
├── analysis.R
├── analusis_2.R
├── analusis4.R
├── parse_data.sh
├── tools #<< environment
│ ├── FACS.exe
│ ├── plink1.09
│ ├── plink1.7
│ ├── plink2.0
│ └── R2.9
├── mydata #<< see next lecture
│ ├── facs.xlsx
│ └── single-cells.txt
├── old_scripts/
│ ├── analysis_final.R
│ ├── analysis_final_2.R
│ ├── analysis_tmp.R
│ └── analysis_final_nature.R
├── Analysis-international_frontiers_biomedical-Machine_learning.R # IF 3.011!
├── Manuscript_2015-oct.docx
├── Manuscript_15-oct2.docx
├── Manuscript_15-oct2-PI.docx
└── Manuscript_Sept-2015.docx
Keeping things tidy
Version control
Code versioning systems support you in all of these use cases.
Capacities in code versioning
add
) files in a project called a repository
commit
) resolutions.)shared repository
gitlab
, local platforms maintained and supported by institutes, e.g. gitlab.lcsb.uni.luUsing RStudio git panel:
The equivalent command line is git add coude.R
After a change file is tagged with M (modified)
Click in the staged column to move the modification to the staged area.
git
you can move between commits.The command line equivalent is:
git commit -m "Added Chip-seq input analysis.R"
Optionally use rebase
(rewrite for linear history to avoid merge commits) using the little black arrow menu next to Pull
git pull
git pull --rebase
Push
the local state to the remote repositoryCommand line equivalent of :
git push
Data types to be aware of
Possible solutions
data
directory*.html
).RData
)Standard .gitignore
The course practical contains a standard .gitignore file. Remember to take a look!
Double stars escape while highlightLines is true. Let’s take a look at the .gitignore file in the practical
1. Commit and pull frequently
2. Best learn from experienced person
Reliable: packages are checked during submission process
Of note: Microsoft will stop on July 1 2023 MRAN
Dedicated to biology research {limma} example
Requires dedicated package BiocManager
for installation.
CRAN install from Rstudio (autocompletion)
CRAN install from Rstudio’s console
Documentation
Package documentations becomes complex fast and tedious to control manually * Projects require different versions of packages * Collaborators have different versions installed on their machines!
What does not work
conda
for scientific programming
Python virtual environments
renv
R - environments
renv
featureshydrate()
searches source code files for library callsinstall(pkgs)
installs package pkgs
including its dependenciessnapshot()
registers changes, hashes and originrestore()
to a certain point in time```{r}
> renv::snapshot()
The following package(s) will be updated in the lockfile:
# CRAN ===============================
- RcppParallel [5.0.2 -> 5.0.3]
- cli [2.3.0 -> 2.3.1]
- pkgload [1.1.0 -> 1.2.0]
- tint [0.1.3 -> *]
# GitHub =============================
- targets [ropensci/targets@main: 598d7a23 -> bdc1b29c]
Do you want to proceed? [y/N]:
```
renv.lock
file after a snapshot
"R": {
"Version": "4.0.3",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cloud.r-project.org"
}
]
},
"Bioconductor": {
"Version": "3.12"
},
"Packages": {
"AnnotationDbi": {
"Package": "AnnotationDbi",
"Version": "1.52.0",
"Source": "Bioconductor",
"Hash": "ca5106b296b3aa6af713ce197be547c1"
},
"BH": {
"Package": "BH",
"Version": "1.75.0-0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "e4c04affc2cac20c8fec18385cd14691"
},
"targets": {
"Package": "targets",
"Version": "0.1.0.9000",
"Source": "GitHub",
"RemoteType": "github",
"RemoteUsername": "ropensci",
"RemoteRepo": "targets",
"RemoteRef": "main",
"RemoteSha": "598d7a23661d4c760209c7991bf10584eadcf7c8",
"RemoteHost": "api.github.com",
"Hash": "ee66061fd5c757ec600071965d457818"
},
[...]
1. Check the github-classroom
Follow this link and authorize it
2. Accept the assignment
You should see this invite:
3. Go to the assignment
Reload the browser page
The assignment is created in your personal github repository.
Click on the link with the blue background
4. Check your own repo
It should look like this:
5.Copy the Code SSH URL
Make sure to use the Git Clone with SSH!
6.Insert the URL for a new Git project
In Repository URL
7. Install renv and yaml by running install.packages(c("renv", "yaml"))
8. Activate renv by running renv::activate()
.
9. Run renv::hydrate()
to install the packages necessary. You should see:
Discovering package dependencies ... Done!
Copying packages into the cache ... Done!
Should be fast since you already have most packages in your renv
cache.
10. Create your first package snapshot()
with renv::snapshot()
. The output should be something like this:
The following package(s) will be updated in the lockfile: # CRAN =============================== - R6 [* -> 2.5.1] - base64enc [* -> 0.1-3] - bslib [* -> 0.3.1] [...] - tinytex [* -> 0.38] - xfun [* -> 0.30] - yaml [* -> 2.3.5] Do you want to proceed? [y/N]:
Say y to Do you want to proceed? [y/N]:. The renv.lock is created.
The first tutorial (datasauRus) will be guided and demonstrate capacities in the tidyverse which we will explore in the workshop.
You learned about:
Acknowledgments
renv
)