readr 2.0In this practical, you’ll learn how to import flat files using the
readrpackage

To perform reproducible research it is a good practice to store the files in a standardized location. For example, you could take advantage of the RStudio projects and store data files in a sub-folder called data.
If you did not create an Rstudio project yet, create a RStudio project now.
Check that the project is active: the name you chose should appear on the top-right corner.
Create a folder named data within your project’s folder. Use the Files pane in the lower right Rstudio panel or your favorite file browser.
Download the file blood_fat.csv and place it in the data sub-folder you just created.
Create a new Rmarkdown file, save it at the project root with a relevant name.
Add a code chunk and with those lines to load the libraries. You don’t need to install the packages if those lines are working fine
library(dplyr)
library(readr)
knit button, the chunks are evaluated in a new and fresh environment.
readr to load your first fileblood_fat filethe relative path can be safely built using "data/blood_fat.csv" if you followed the preliminary steps above, download the csv in a sub-folder data of a RStudio project
For example, you folder structure could be (depending on the picked names). Here:
Rworkshoppractical02_import.Rmd.
├── data
│ └── blood_fat.csv
├── practical03_import.Rmd
└── Rworkshop.Rproj
## Rows: 25 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): group
## dbl (4): id, weight, age, fat
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 25 × 5
## id group weight age fat
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1 A 84 46 354.
## 2 2 A 73 20 190.
## 3 3 A 65 52 406.
## 4 4 A 70 30 264.
## 5 5 A 76 57 452.
## 6 6 A 69 25 302.
## 7 7 A 63 28 288.
## 8 8 A 72 36 386.
## 9 9 A 79 57 402.
## 10 10 A 75 44 366.
## # … with 15 more rows
read_delim() execution is reporting the dimensions of the file, along with the guessed delimiter and data type of each columns
If we are happy with the guessed delimiter and the column names / types, we could silent this reporting.
read_delim() messageread_delim() loads the data as a tibble. The main advantage to use tibbles over a regular data frame is the printing.
age column?Actually, both age and id are integers, and should be read as such.
blood_fat.csv specifying the data types of age and id as integerscol_types = c(....) you can use the columns bare names and either the long description to call the specific data type like col_integer() or the shortcut "i"
blood_fat.csv specifying the data types of age and id as integers, skipping weightage and weight per group