R is a powerful language for data science in many disciplines of research with a steep learning curve. The tidyverse group of packages provide a dialect that greatly simplifies:
%>%
)Adopt Hadley Wickham, Chief Scientist at RStudio, philosophy: take each step of data science and replace many intricacies of R with clear, consistent and easy to learn syntax. RStudio will be the software to use since it eases package management, scripting, plotting and data handling.
The four day course provides a complete introduction to data science in R with the tidyverse. The course will not go deep into statistics but rather getting data ready, some exploratory analysis, visualization and handling models.
Preparing data takes up to 80% of the time spent in analysis — speeding this up is the mission of this course.
The tidyverse is an official CRAN package and here is its manifesto. Hadley proposed the following workflow described in his must-read book R for data science
H. Wickham - R for data science, licence CC
In terms of R packages, the workflow is nicely depicted as in this picture, by David Robinson
Participants should have basic experience in programming environments such as Matlab, Octave or other programming languages or complete a simple free online course.
There will be a special session on updates of the tidyverse for participants of previous iterations of the course.
Each student must bring their own laptop with R and Rstudio installed with recent versions. Please look at install tutorial to set it up prior to the course.
Each day, the workshop will be a mixture of lectures and practicals from:
Please join us for the welcome coffee to test your connection, R set-up and to pose questions.
Date | Time | Session | Teacher | Notes |
---|---|---|---|---|
2021-05-06 | 09:30 | Introduction | AG | |
11:00 | Tidy data | RK | ||
13:30 | String manipulation | RK | stringr |
|
14:30 | RMarkdown | VC | knitr |
|
16:00 | Import | VC | readr |
|
2021-05-07 | 09:30 | Data transformation | RK | dplyr , tidyr |
13:30 | Visualization | AG | ggplot |
|
2021-05-10 | 09:30 | Functional programming | AG | purrr |
13:30 | Tidy models | AG | broom |
|
15:30 | Advanced Programming | AG | tidyeval |
|
2021-05-11 | 09:30 | New developments in 2021 | AG | |
10:15 | Coffee break | |||
10:45 | Practical | All |
R
and loading data via the readr
package as well as Rmarkdown
.tidyr
and dplyr
packages as well as ggplot2
for visualisation.purrr
package, which greatly simplifies repeating operations. Many statistical packages have complicated and idiosyncratic data structures. The broom
package helps to convert them to consistent data structures.The course will be held online using Webex. The lectures (only) will be recorded. It will not be available during the course time.
The course is limited to 30 participants. Registration is closed.
PhD students that enrolled through the doctoral school of the University of Luxembourg will receive 2 ECTS in category 1, which requires handing in a short project and all practicals as Rmd files.
Please note that ECTS can only be received for the course once. If you already have received ECTS for the last version of the course, no ECTS can be awarded for this year’s edition. If you are not at student at the University of Luxembourg, contact your doctoral school or other relevant authority to your course of study first.
This event is supported by ELIXIR-Luxembourg