# install.packages("gapminder")
library(gapminder)
|>
gapminder ggplot(aes(x = year,
y = lifeExp,
group = country)) +
geom_line()
Gapminder tutorial
Guided practical
Getting started
- Load
gapminder
andtidyverse
packages by running thesetup
chunk - Use the pipe
|>
to passgapminder
toggplot()
- Plot the
life expectency
(lifeExp
iny
) ~year
(x
) - Use
geom_line()
Mind the grouping!
Add linear models
- Using
by_country
- Add a new column
model
with linear regressions oflifeExp
onyear1950
- Save as
by_country_lm
Explore a list column
- Count the number of rows per country in the
data
column. - Does any country have less data than others?
- Try writing a single statement that solves this questions!
Explore a list column by plotting
- Plot
lifeExp
~year1950
for Bulgaria by unnestingdata
filter()
for the desired countryunnest()
rawdata
- Pipe to
ggplot()
Display the summary
for the linear model of Rwanda
- How do you interpret the \(R^2\) for this particular model?
filter()
for the desired country- Use
list()
to runsummary()
on the linear model - To extract the named
"r.squared"
, use thepluck(sumary, "r.squared")
,purrr
syntax
Cleanup using broom
- Check that
broom
is loaded - Using
by_country_lm
, add 4 new columns:glance
, using the broom function on themodel
columntidy
, using the broom function on themodel
columnaugment
, using the broom function on themodel
columnrsq
from theglance
column
- Save as
models
- Why is extracting the \(R^2\) in the main tibble is useful?
- Use
list()
when dealing with a list columnrowwise
grouped
Plotting \(R^2\) for countries
- Plot
country
~rsq
- Color points per continent
- Reorder country levels by \(R^2\) (
rsq
): snake plot - Which continent shows most of the low \(R^2\) values?
- To reorder the discrete values of
country
:
Hints
- Use the
forcats
package fct_reorder(country, rsq)
to reorder based on thersq
continuous variable
Display the real data for countries with a low \(R^2\)
- Focus on non-linear trends
- Filter the 20 countries with the lowest \(R^2\)
unnest
columndata
- Plot
lifeExp
~year
with lines - Colour per continent
- Facet per country
Hints
- You must
ungroup()
as we work currently by row. slice_min(col, n = 5)
returns the 5 minimal values ofcol
Same questions for the top 20 \(R^2\)
Summarise on one plot
- Unnest coefficients (
tidy
column)- Mind to keep the
continent
,country
andrsq
columns
- Mind to keep the
- Put intercept and slope in their own columns
- In wide format, only one value can be used.
- Discard unused columns.
- Plot
slope ~ intercept
(watch out the(Intercept)
name which needs to be called between backticks! - Colour per continent
- Size per \(R^2\) (use for
scale_size_area()
for readability) - Add tendency with
geom_smooth(method = "loess")