Climate change

Author

Affiliation

Aurélien Ginolhac

R Workshop

Published

February 13, 2025

Note

This practical connects the lectures and practicals covered in the course as they would work together for a typical data analysis with data import, transformation, summarizing and plotting.

Atmospheric carbon dioxide

Carbon dioxide, CO₂ is as its name says, an oxide. Meaning, once in the atmosphere is it extremely stable and will remain there for thousands of years. Two main carbon sinks exist: forest, mainly trees that incorporate the carbon for their growth and oceans. The latter have absorbed approximately half of what humans have produce by burning oil, not without consequences. Oceans are getting warmer, making CO₂ solubility weaker and diminishes the pH. This acidification already killed half of the animals building coral reef (91% as in 2022) and calcifying organisms. CO₂, like methane is a greenhouse gas, absorbing and radiating infrared thermal energy leading to heat being trapped close to the ground. It is worth saying that the first scientist to discover the link between CO₂ and heat trap was a woman Eunice Newton Foote as early as 1856. French version of the Wikipedia page is more in line with the Smithsonian article: she was not allowed to present her work because of her gender.

Find out how long carbon dioxide can last in our atmosphere and why it matters to look at cumulative emissions

Cumulative carbon dioxide emissions

Due to extreme long time CO₂ remains in the atmosphere, looking at yearly emissions is of little interest. Especially since this is used by rich countries who got rid of most of their industry to justify little efforts. What matters is the cumulative emissions. For this, we will look at the data from the World Bank. Unfortunately, they don’t provide the 1960 - 2020 but 1990-2021, so please use my local copy linked below.

Read the CSV `API_EN.ATM.CO2E.PC_DS2_en_csv_v2_3731558.csv`, assign the name `cum_co2`

Tip

If you look at the file, the first 4 lines are not of interest and should be skipped.
Column names will be having spaces, leading digits etc… Using name_repair = "unique" would help.
Columns then named "Country code":"Indicator Code" can be discarded.

Solution

read_csv("data/API_EN.ATM.CO2E.PC_DS2_en_csv_v2_3731558.csv", 
         skip = 4L,
         show_col_types = FALSE, name_repair = "unique") |> 
  select(-c("Country Code":"Indicator Code")) -> cum_co2

New names:
• `` -> `...66`

cum_co2

# A tibble: 266 × 63
   `Country Name`    `1960`  `1961`  `1962`  `1963`  `1964` `1965` `1966` `1967`
   <chr>              <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
 1 Aruba            NA      NA      NA      NA      NA      NA     NA     NA    
 2 Africa Eastern …  0.906   0.922   0.931   0.941   0.996   1.05   1.03   1.05 
 3 Afghanistan       0.0461  0.0536  0.0737  0.0742  0.0862  0.101  0.107  0.123
 4 Africa Western …  0.0909  0.0953  0.0966  0.112   0.133   0.185  0.194  0.189
 5 Angola            0.101   0.0822  0.211   0.203   0.214   0.206  0.269  0.172
 6 Albania           1.26    1.37    1.44    1.18    1.11    1.17   1.33   1.36 
 7 Andorra          NA      NA      NA      NA      NA      NA     NA     NA    
 8 Arab World        0.609   0.663   0.727   0.853   0.972   1.14   1.25   1.32 
 9 United Arab Emi…  0.119   0.109   0.164   0.176   0.133   0.147  0.160  5.40 
10 Argentina         2.38    2.46    2.54    2.33    2.55    2.66   2.81   2.87 
# ℹ 256 more rows
# ℹ 54 more variables: `1968` <dbl>, `1969` <dbl>, `1970` <dbl>, `1971` <dbl>,
#   `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>,
#   `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>,
#   `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>,
#   `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>,
#   `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, …

This dataset is not tidy, why?

Tip

Dates are variables and should be on one column, where values will be co2_emissions_mt_per_cap

Pivot accordingly and assign `cum_co2_long`

Tip

Pivot all columns but the identifier of interest: country_name
The names_to would make sense to be equal to year.
Transform the names column to integers

Solution

cum_co2 |> 
  pivot_longer(cols = -"Country Name",
               names_to = "year",
               values_to = "co2_emissions_mt_per_cap",
               names_transform = as.integer) -> cum_co2_long

Warning in f(names[[col]]): NAs introduced by coercion

cum_co2_long

# A tibble: 16,492 × 3
   `Country Name`  year co2_emissions_mt_per_cap
   <chr>          <int>                    <dbl>
 1 Aruba           1960                       NA
 2 Aruba           1961                       NA
 3 Aruba           1962                       NA
 4 Aruba           1963                       NA
 5 Aruba           1964                       NA
 6 Aruba           1965                       NA
 7 Aruba           1966                       NA
 8 Aruba           1967                       NA
 9 Aruba           1968                       NA
10 Aruba           1969                       NA
# ℹ 16,482 more rows

Cleanup by removing the missing values and the year `66`

Tip

The first year is 1960, a coherent filtering on the year column. For removing missing values of CO2, drop_na() is a good pick

Solution

cum_co2_long |> 
  drop_na(co2_emissions_mt_per_cap) |> 
  filter(year > 1960) -> cum_co2_long

Plotting Worldwide yearly emissions

Tip

Summarise the emissions per year to have them worldwide.
Compute the cumulative sum (checkout the cumsum() function) of emissions.
Plot this cumulative sum.
Add vertical lines for the year 1973, 1979 and 2008. 3 major crisis and comment.

Solution

cum_co2_long |> 
  summarise(co2_sum = sum(co2_emissions_mt_per_cap),
            .by = year) |> 
  mutate(cum = cumsum(co2_sum)) |> 
  ggplot(aes(x = year)) +
  geom_line(aes(y = co2_sum), color = "purple") +
  geom_vline(xintercept = c(1973, 1979, 2008), linetype = "dashed") +
  scale_x_continuous(breaks = seq(1960, 2020, 10)) +
  labs(caption = "Source: World Bank",
       x = NULL,
       y = "Worldwide Carbon dioxide emissions")

Burning fossils is what make economy working. Any crisis comes usually after a peak of energy prices. Even the 2008 crisis is partly due to this price increase. The US reacted by from 2010 firing up the production of shale oil.

Using the website Our World in Data load in R their dataset on economic growth as a CSV the World GDP over the last millennia

GDP is the main economic indicator and the only target for most countries since WWII. And this despite a poor indicator (accidents, disasters increase GDP) it is the target.

Solution

#world_gdp <- read_csv("data/world-gdp-over-the-last-two-millennia.csv") |> 
#world_gdp <- read_csv("data/global-gdp-over-the-long-run.csv") |> 
world_gdp <- read_delim("https://ourworldindata.org/grapher/global-gdp-over-the-long-run.csv?v=1&csvType=filtered&useColumnShortNames=true&time=1900..2015", col_select = c("Year", "gdp"), col_types = cols(Year = col_integer()))   |> 
 rename(GDP = gdp)

Join `cum_co2_long` and the World GDP per year. Plot this relationship and comment

Solution

cum_co2_long |> 
  summarise(co2_sum = sum(co2_emissions_mt_per_cap),
            .by = year) |> 
  mutate(co2_cum = cumsum(co2_sum)) |> 
  left_join(world_gdp, by = c("year" = "Year")) |> 
  ggplot(aes(x = co2_cum, y = GDP)) +
  geom_text(data = \(x) filter(x, year %% 10 == 0), 
            aes(label = year), nudge_x = -4e3) +
  geom_point() +
  geom_line() +
  scale_y_continuous(labels = scales::label_dollar(scale = 1e-12)) +
  scale_x_continuous(labels = scales::label_comma()) +
  labs(x = "Cumulative CO2 emissions (metric tons per capita)",
       y = "GDP (trillion $)")

Warning: Removed 30 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 16 rows containing missing values or values outside the scale range
(`geom_line()`).

Find the year on which we emitted 50% of all human caused carbon dioxide.

Solution

cum_co2_long |> 
  summarise(co2_sum = sum(co2_emissions_mt_per_cap),
            .by = year) |> 
  mutate(cum = cumsum(co2_sum)) |> 
  filter(cum > (max(cum) / 2))

# A tibble: 28 × 3
    year co2_sum    cum
   <int>   <dbl>  <dbl>
 1  1991    986. 27183.
 2  1992    953. 28136.
 3  1993    937. 29073.
 4  1994    928. 30001.
 5  1995    947. 30948.
 6  1996    961. 31909.
 7  1997    966. 32875.
 8  1998    967. 33842.
 9  1999    962. 34804.
10  2000    964. 35768.
# ℹ 18 more rows

Solution 1991!

However, one can argue that the emissions data start only in 1960 and are in per-capita units.

Let’s explore with another dataset from the famous Our World in Data organisation

Download and open the CSV CO2 file, assign name owid_co2

Tip

This file comes already in a data format you can use straight away. But there are missing values in columns, so pay attention to use the argument na.rm = TRUE in your sums.

Solution

owid_co2 <- read_csv("https://nyc3.digitaloceanspaces.com/owid-public/data/co2/owid-co2-data.csv",
                     show_col_types = FALSE)

Using the `co2` column, sum up per year the cumulative emission of carbon dioxide and find out when 50% of emissions were produced

Tip

Select directly the “World” as country and the cumulative CO2 are also already computed in cumulative_co2

Solution

owid_co2 |> 
  filter(country == "World") |> 
  select(country, year, cumulative_co2) |> 
  filter(cumulative_co2 > (max(cumulative_co2) / 2))

# A tibble: 29 × 3
   country  year cumulative_co2
   <chr>   <dbl>          <dbl>
 1 World    1995        923887.
 2 World    1996        948146.
 3 World    1997        972548.
 4 World    1998        996851.
 5 World    1999       1021705 
 6 World    2000       1047216.
 7 World    2001       1072922.
 8 World    2002       1099199.
 9 World    2003       1126868.
10 World    2004       1155501.
# ℹ 19 more rows

We find 1993 this time. Previous 1960 emissions were tiny, Worldwide population almost tripled between 1960 and today and energy usage in Western countries just exploded.

Display the top 10 Carbon dioxide countries emitters of all time and comment

Tip

The Our World in Data added the calculations for continents and world inside country. to select only individual countries, select only iso_code that are 3 characters long.

Solution

owid_co2 |> 
  filter(str_length(iso_code) == 3L) |>
  summarise(co2_sum = sum(co2, na.rm = TRUE),
            .by = country) |> 
  slice_max(co2_sum, n = 10) |> 
  mutate(rank = row_number(), .before = 1L)

# A tibble: 10 × 3
    rank country        co2_sum
   <int> <chr>            <dbl>
 1     1 United States  431853.
 2     2 China          272532.
 3     3 Russia         121267.
 4     4 Germany         94582.
 5     5 United Kingdom  79778.
 6     6 Japan           68765.
 7     7 India           62870.
 8     8 France          39685.
 9     9 Canada          35156.
10    10 Ukraine         31079.

Note

France and Germany, our very next neighbors who argue to produce less than 1% of the worldwide annual CO₂ forget their historical emissions.

Acknowledgements

Appendix

There is too much of fossil energy left to make our climate livable. The carbon dioxide concentration has never been that high as today since 800,000 years. Thus, all the humankind history has lived in lower concentrations, the new era is then an unknown territory. However, as seen in Delannoy et al. 2021, in Applied Energy there is not enough fossil for continuing our current life style, and for lower income countries to reach it.

Atmospheric carbon dioxide

Find out how long carbon dioxide can last in our atmosphere and why it matters to look at cumulative emissions

Cumulative carbon dioxide emissions

Read the CSV API_EN.ATM.CO2E.PC_DS2_en_csv_v2_3731558.csv, assign the name cum_co2

This dataset is not tidy, why?

Tip

Pivot accordingly and assign cum_co2_long

Cleanup by removing the missing values and the year 66

Plotting Worldwide yearly emissions

Using the website Our World in Data load in R their dataset on economic growth as a CSV the World GDP over the last millennia

Join cum_co2_long and the World GDP per year. Plot this relationship and comment

Find the year on which we emitted 50% of all human caused carbon dioxide.

Download and open the CSV CO2 file, assign name owid_co2

Using the co2 column, sum up per year the cumulative emission of carbon dioxide and find out when 50% of emissions were produced

Display the top 10 Carbon dioxide countries emitters of all time and comment

Acknowledgements

Appendix

Read the CSV `API_EN.ATM.CO2E.PC_DS2_en_csv_v2_3731558.csv`, assign the name `cum_co2`

Pivot accordingly and assign `cum_co2_long`

Cleanup by removing the missing values and the year `66`

Join `cum_co2_long` and the World GDP per year. Plot this relationship and comment

Using the `co2` column, sum up per year the cumulative emission of carbon dioxide and find out when 50% of emissions were produced