This practical connects most lectures and practicals covered in the course as they would work together for a typical data analysis with data import, transformation, summarizing and plotting.
The dataset is from the paper by Potvin, C., Lechowicz, M. J. and Tardif, S. (1990). The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures., Ecology, 71, 1389–1400. You can find it at this URL
plant_CO2plant_CO2_url is a good name for the URL of this dataset. plant_CO2_file is appropriate for tsv file if you download it.
count() with the relevant columns is the easiest approach. Sorting by the highest number is often a nice way to discover the data. This option is available in count()
n_distinct(). This would have to be done three times, one for each character column. Otherwise, n_distinct() can be applied to character columns using across() and relevant where() condition.
n_distinct(), across() are functions that work inside mutate() or summarise().
The numeric columns are defined as:
conc is ambient carbon dioxide concentrations (\(mL/L\)).uptake is carbon dioxide uptake rates (\(umol/m^2\) sec).conc and uptake depending on the location (Quebec or Mississippi)conc and uptake depending on the plant location (type column), plant and treatment. Add the number of observations per group.desc(mean_uptake) in the arrange() function
It is hard to see if all plants have both treatment, you can pivot the table to find out. Right now the table after summarisation is in the long format. Pivoting means converting the table to the wide format using the treatment unique values (chilled and nonchilled) as new columns and filled with the values taken from the mean_uptake column.
plant and type as ids, names from treatment and values from the mean of carbon dioxide uptaketidyr::pivot_wider()
Numbers are nice but plotting the data might better reveal how location and treatment relate.
As a motivation, here is one version of the plot:
