Introduction

R is a powerful language for data science in many disciplines of research with a steep learning curve. The tidyverse group of packages provide a dialect that greatly simplifies:

  • data importing
  • cleaning
  • processing
  • visualization as well as providing reproducible workflows using pipelines (%>%)

Adopt Hadley Wickham, Chief Scientist at RStudio, philosophy: take each step of data science and replace many intricacies of R with clear, consistent and easy to learn syntax. RStudio will be the software to use since it eases package management, scripting, plotting and data handling.

The four day course provides a complete introduction to data science in R with the tidyverse. The course will not go deep into statistics but rather getting data ready, some exploratory analysis, visualization and handling models.

Preparing data takes up to 80% of the time spent in analysis — speeding this up is the mission of this course.

Tidyverse

The tidyverse is an official CRAN package and here is its manifesto. Hadley proposed the following workflow described in his must-read book R for data science

H. Wickham - R for data science, licence CC

In terms of R packages, the workflow is nicely depicted as in this picture, by David Robinson


Requirements

Prior knowledge

Participants should have basic experience in programming environments such as Matlab, Octave or other programming languages or complete a simple free online course.

There is a special session on updates of the tidyverse on 11 May 2020 for participants of previous iterations of the course. No registration is required to attend.

Material

Each student must bring their own laptop with R and Rstudio installed with recent versions. Please look at install tutorial to set it up prior to the course.

Schedule

This course will take place either offline or online depending on the requirements.

Dates and time

7 & 8 and 11 & 12 May 2020.

Each day, the workshop will be a mixture of lectures and practicals from:

  • 9:00 - 9:30 welcome coffee
  • 9:30 - 12:30 course work with a coffee break
  • 12:30 - 13:30 lunch breaks from
  • 13:30 - 17:30 course work with a coffee break

  • Coffee will be served outside the course room in the lounge area.

Program

Date Time Session Teacher Notes
2020-05-07 09:30 Introduction AG
11:00 Tidy data RK
13:30 String manipulation RK stringr
14:30 RMarkdown AG knitr
16:00 Import AG readr
18:00 Dinner with participants RK
2020-05-08 09:30 Data transformation RK dplyr, tidyr
13:30 Visualization AG ggplot
2020-05-11 09:30 Functional programming AG purrr
13:30 Tidy models AG broom
15:30 Advanced Programming AG tidyeval
2020-05-12 09:30 New developments in 2020 AG
10:15 Coffee break
10:45 Practical All

Location

The course will be held online or (if the Covid-19 situation is cleared by then) at:

Maison du Nombre (MNO) Room 1.1010 University of Luxembourg Belval Campus 6, avenue de la Fonte L-4364 Esch-sur-Alzette Luxembourg

map

View Larger Map

MNO room 1.1010, 1st floor

MNO room 1.1010, 1st floor

Registration

The course is limited to 30 participants. Register through this form

ECTS

PhD students that enrolled through the doctoral school will receive 2 ECTS in category 1, which requires handing in a short project and practicals as Rmd files that students miss. For example if you are absent the first day, you must do and send back:

  • practical introduction with datasauRus
  • practical string manipulation
  • practical import data

Please note that ECTS can only be received for the course once. If you already have received ECTS for the last version of the course, no ECTS will be awarded for this year’s edition.

Elixir

This event is supported by ELIXIR-Luxembourg