This practical connects most lectures and practicals covered in the course as they would work together for a typical data analysis with data import, transformation, summarizing and plotting.

Carbon dixoide plant intake

The dataset is from the paper by Potvin, C., Lechowicz, M. J. and Tardif, S. (1990). The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures., Ecology, 71, 1389–1400. You can find it at this URL

Load the data

Load the dataset, assign the name plant_CO2

Tip

Pay attention to use meaningful names. plant_CO2_url is a good name for the URL of this dataset. plant_CO2_file is appropriate for tsv file if you download it.

Explore the data

How many rows and columns are present? And which data types are they?
Three columns are categorical, taken together, how many observations have they?

Tip

Using count() with the relevant columns is the easiest approach. Sorting by the highest number is often a nice way to discover the data. This option is available in count()
How many categories for each of the character columns do you find?

Tip

One option is to find the number of unique values using n_distinct(). This would have to be done three times, one for each character column. Otherwise, n_distinct() can be applied to character columns using across() and relevant where() condition.

Warning

n_distinct(), across() are functions that work inside mutate() or summarise().
Rename the columns to make them all lower case and continue with the resulting tibble for the next steps.

CO2 data

The numeric columns are defined as:

  • conc is ambient carbon dioxide concentrations (\(mL/L\)).
  • uptake is carbon dioxide uptake rates (\(umol/m^2\) sec).
Compute the mean of conc and uptake depending on the location (Quebec or Mississippi)

Tip

Ignore the treatment, chilled or / non chilled for now
Does the CO2 concentrations appear different in both locations?
Compute the mean of conc and uptake depending on the plant location (type column), plant and treatment. Add the number of observations per group.

Tip

the number of observations should be 7 everywhere as you saw before
From the summary above, sort the table to show the plant / location / treatment that uptake the most carbon dioxide

Tip

a descending ordering is possible with desc(mean_uptake) in the arrange() function

It is hard to see if all plants have both treatment, you can pivot the table to find out. Right now the table after summarisation is in the long format. Pivoting means converting the table to the wide format using the treatment unique values (chilled and nonchilled) as new columns and filled with the values taken from the mean_uptake column.

Pivot the table using plant and type as ids, names from treatment and values from the mean of carbon dioxide uptake

Tip

look up the different arguments of the function tidyr::pivot_wider()
What do you conclude, does the same plant have been assessed for both treatment?

Visualize the location and CO2 uptake

Numbers are nice but plotting the data might better reveal how location and treatment relate.

Plot the CO2 uptake in function of plant location, using violin plots filled by treatment

As a motivation, here is one version of the plot: