This little tutorial aims to make you familiar with some of the functions of the stringr package and a few regular expressions.

Strings and escape sequences in

Write a sentence with escape sequences.

Try the sentence: "It's the end of the world!" he said.\ . Assign the string to a variable and try asprint(), cat() and writeLines().

stringr functions

We will be using the words data that is built into stringr. The data set is availble to you if you load the package.

length(words)
## [1] 980
Select words that
  1. Contain a y

  2. Start with y

  3. Contain a y within the word

Extract the y and the previous character.

Note: Use the function unique() around the results to avoid printing many empty matches.

Get the lengths of the first ten words

You can use head(words, 10) as a convenient way to access the elements of the words vector.

Viral research

Read the genome sequence of the Hepatitis D virus: hepd.fasta. For now, just execute the following:

hepd <- readr::read_lines("https://biostat2.uni.lu/practicals/data/hepd.fasta")
What is the length of the genome sequence?
What is the sequence composition? How often does each character occur?
Find motifs in the sequence using str_locate().

Find all matches of the sequence \(ATG\) in the sequence.