This guided practical will demonstrate that the tidyverse allows to compute summary statistics and visualize datasets efficiently. This dataset is already stored in a tidy tibble, cleaning steps will come in future practicals.
datasauRus packagedatasauRus installedlibrary(datasauRus)
there is no package called ‘datasauRus’ appears, it means that the package needs to be installed. Use this:install.packages("datasauRus")
Since we are dealing with a tibble, we can just type
datasaurus_dozen
only the first 10 rows are displayed.
| dataset | x | y |
|---|---|---|
| dino | 55.3846 | 97.1795 |
| dino | 51.5385 | 96.0256 |
| dino | 46.1538 | 94.4872 |
| dino | 42.8205 | 91.4103 |
| dino | 40.7692 | 88.3333 |
| dino | 38.7179 | 84.8718 |
| dino | 35.6410 | 79.8718 |
| dino | 33.0769 | 77.5641 |
| dino | 28.9744 | 74.4872 |
| dino | 26.1538 | 71.4103 |
base version, using either dim(), ncol() and nrow()
tidyverse version
datasaurus_dozen to the ds_dozen name This aims at populating the Global Environment# n_distinct counts the unique elements in a given vector.
# we use summarise to return only the desired column named n here.
summarise(ds_dozen, n = n_distinct(dataset))
## # A tibble: 1 x 1
## n
## <int>
## 1 13
datasetcount in dplyr does the group_by() by the specified column + summarise(n = n()) which returns the number of observation per defined group.
x & y column. For this, you need to group_by() the appropriate column and then summarise()summarise() you can define as many new columns as you wish. No need to call it for every single variable.
across()ds_dozen with ggplot such the aesthetics are aes(x = x, y = y)with the geometry geom_point()
ggplot() and geom_point() functions must be linked with a + sign
dataset columnToo many datasets are displayed.
%in% to test if there a match of the left operand in the right one (a vector most probably)
dataset per facettheme_void and remove the legendgifski could be installed on your machine, makes the GIF creation faster. gifski is internally written in rust, and this language needs cargo to run. See this article to get it installed on your machine. First install rust before install the R package gifski. Please note, that the animate() step still takes ~ 3-5 minutes depending on your machine.
gganimate, its dependencies will be automatically installed.dataset variable to the transition_states() argument layernever trust summary statistics alone; always visualize your data | Alberto Cairo
Authors
from this post