This guided practical will demonstrate that the tidyverse allows to compute summary statistics and visualize datasets efficiently. This dataset is already stored in a tidy tibble
, cleaning steps will come in future practicals.
datasauRus
packagedatasauRus
installedlibrary(datasauRus)
there is no package called ‘datasauRus’
appears, it means that the package needs to be installed. Use this:install.packages("datasauRus")
Since we are dealing with a tibble
, we can just type
datasaurus_dozen
only the first 10 rows are displayed.
dataset | x | y |
---|---|---|
dino | 55.3846 | 97.1795 |
dino | 51.5385 | 96.0256 |
dino | 46.1538 | 94.4872 |
dino | 42.8205 | 91.4103 |
dino | 40.7692 | 88.3333 |
dino | 38.7179 | 84.8718 |
dino | 35.6410 | 79.8718 |
dino | 33.0769 | 77.5641 |
dino | 28.9744 | 74.4872 |
dino | 26.1538 | 71.4103 |
base version, using either dim()
, ncol()
and nrow()
tidyverse version
datasaurus_dozen
to the ds_dozen
name This aims at populating the Global Environment# n_distinct counts the unique elements in a given vector.
# we use summarise to return only the desired column named n here.
summarise(ds_dozen, n = n_distinct(dataset))
## # A tibble: 1 x 1
## n
## <int>
## 1 13
dataset
count
in dplyr
does the group_by()
by the specified column + summarise(n = n())
which returns the number of observation per defined group.
x
& y
column. For this, you need to group_by()
the appropriate column and then summarise()
summarise()
you can define as many new columns as you wish. No need to call it for every single variable.
across()
ds_dozen
with ggplot
such the aesthetics are aes(x = x, y = y)
with the geometry geom_point()
ggplot()
and geom_point()
functions must be linked with a + sign
dataset
columnToo many datasets are displayed.
%in%
to test if there a match of the left operand in the right one (a vector most probably)
dataset
per facettheme_void
and remove the legendgifski
could be installed on your machine, makes the GIF creation faster. gifski
is internally written in rust
, and this language needs cargo
to run. See this article to get it installed on your machine. First install rust
before install the R package gifski
. Please note, that the animate()
step still takes ~ 3-5 minutes depending on your machine.
gganimate
, its dependencies will be automatically installed.dataset
variable to the transition_states()
argument layernever trust summary statistics alone; always visualize your data | Alberto Cairo
Authors
from this post