<- tribble(
variants ~sampleid, ~var1, ~var2, ~var3,
"S1", "A3T", "T5G", "T6G",
"S2", "A3G", "T5G", NA,
"S3", "A3T", "T6C", "G10C",
"S4", "A3T", "T6C", "G10C"
)
Data transformation with dplyr
This tutorial allows you to explore dplyr
functionality based on the individual lectures. Every question can be answered with a combination of pipes (|>
). Do not use temporary variables and any statements outside of the range of the tidyverse.
Part one - Base dplyr
Import the data from the website.
Assign to the name judgments
Use glimpse()
to identify columns and column types in judgments
.
Select all columns that refer to the STAI questionnaire.
Retrieve all subjects younger than 20 which are in the stress group.
The column for the group is condition
.
Arrange all observations by STAI_pre
so that the subject with the lowest stress level is on top.
What is the subject in question?
Abbreviate the gender column such that only the first character remains
Create a new STAI_pre_category
column.
Use case_when()
or case_match()
to categorize values in STAI_pre
as “low”, “normal” or “high”. For values < 25 in STAI_pre
assign “low”, for values > 65 assign “high”, and for all other values assign “normal”.
To easily see the new column, use relocate()
to move it to the first position of the dataframe.
Rescale values in the REI group
Divide all entries in the REI questionnaire by 5, the maximal value, so the values will be between 0 and 1.
across()
allows modification of multiple columns in one go.
Part two - Grouping and summarizing
Use judgments to compute basic statistics for all moral dilemma columns considering the conditions.
- Compute the mean, the median, the standard deviation as well as min and max values.
- Find meaningful short names for the functions such as
med
formedian()
. - Assign
judgments_condition_stats
to the results.
Sort by groups
- Find the number of subjects by age, gender and condition, e.g. how many 20 years of age females are in the
stress
group. - Sort the resulting tibble such that the condition that contains the most populous group is sorted first (i.e.
stress
orcontrol
appear together). - Ensure that the resulting tibble does not contain groups.
Part three - joins and pivots
Judgements reprise
Pivoting for readability
Take the results from above that are available as judgments_condition_stats
and bring the data into a more readable format.
- Make the table longer. After this step, your tibble should contain three columns:
- The condition
- the name of the moral dilemma and stats (in one column)
- the values of the stats by moral dilemma.
- Split the moral dilemma and stats column. You can use
separate_wider_delim
ormutate
to create a column that contains the moral dilemma (dog, trolley, etc) and the stats (median, min, etc.). - Make the table wider again and move the stats to individual columns.
The final table should look something like this.
condition | moral_dilemma | mean | sd | … |
---|---|---|---|---|
control | dog | 7.24 | 2.23 | … |
stress | wallet | 7.4 | 2.01 | … |
Genetic variants
The tribble
below contains changes of the sequence of a gene. The format in the input is the expected sequence (the reference allele), the position and the variant, commonly called alternative allele.
In T6G
, T
is the reference allele, 6
is the position (along the gene) and G
is the variant allele.
Clean the table of genetic variants such that all variants appear as a column labeled by their position.
The table should look something like this.
sampleid | 3 | 5 | 6 | … |
---|---|---|---|---|
S1 | T | G | G | … |
S2 | G | G | NA | … |
S3 | … | … | … | … |
Select relevant variants
In this table the variants are labeled according to their effect on stability of the gene product.
<- tribble(
variant_significance ~variant, ~significance,
"A3T", "unknown",
"A3G", "damaging",
"T5G", "benign",
"T6G", "damaging",
"T6C", "benign",
"G10C", "unknown"
)
Identify the subjects in the table variants
that carry variants labeled as damaging in variant_significance
table.
The final output should be vector of sample ids.
You can use several join flavours and the %in%
operator to achieve the same result.