Evaluation in programming

tidyeval

Roland Krause

Rworkshop

Friday, 9 February 2024

Learning objectives

You will learn to:

  • Understand how quoted functions work
  • Why we need tidyeval
  • How to apply it for your programming
  • Embrace the curly-curly operator and forget the above points

Environments and promises

Functions enclose their variables

x <- 1
plus_one <- function(x) {
  x <- x + 1
  x
}
plus_one(x)
[1] 2
plus_one(x)
[1] 2

The x object in Global.Env wasn’t modified.

Separate assignment and evaluation

msg <- "old"
delayedAssign("promesse", msg)
msg <- "new!"
promesse # new!
[1] "new!"

The promise was created when msg was "old"

is fine with letting user creating variables (promises) that will be evaluated only later.

Standard vs Non-standard evaluation

base, must refers to known objects

  • But quoting (as no evaluation) is used for axis labels
plot(swiss$Education, swiss$Examination)

ggplot2 or widely in the tidyverse

  • Evaluating columns in data context.
ggplot(swiss, aes(x = Education, y = Examination)) +
  geom_point()

Quoting, delay evaluation

Does not mean adding quotes!

But capture an expression

quote(Education)
Education
quote(swiss$Education)
swiss$Education

Without quoting, we cannot evaluate

eval(Education, envir = swiss)
Error in eval(expr, envir, enclos): object 'Education' not found

envir tells where to find the vector Education

Quoting allows to evaluate when needed

eval(quote(Education), envir = swiss)
 [1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
[26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29

Also in higher level function

Like with or subset

with(swiss, Education)
 [1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
[26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29
with(swiss, Education > 10)
 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
[13] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE
[25] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

Non-standard evaluation (NSE)

Unquoted variable names

Quotation also works for expression

Exist in base

subset(swiss, (Education + Examination) > 60)[, 2:4]
             Agriculture Examination Education
Neuchatel           17.6          35        32
V. De Geneve         1.2          37        53

dplyr

filter(swiss, (Education + Examination) > 60) |>
  select(Agriculture:Education)
             Agriculture Examination Education
Neuchatel           17.6          35        32
V. De Geneve         1.2          37        53

Evaluating expression

Standard evaluation of an expression

To evaluate an expression , you search environments for name bindings-values and perform the evaluation immediately.

Non-standard evaluation

Means you might

  • Modify the expression or
  • Modify the chain of searched environments before evaluation.

Why swiss is found while being absent in Global_Env?

Values in environments are searched in a precise order

# callr allows to run a clean R session
callr::r(function() rlang::search_envs())
 [[1]] $ <env: global>
 [[2]] $ <env: package:stats>
 [[3]] $ <env: package:graphics>
 [[4]] $ <env: package:grDevices>
 [[5]] $ <env: package:utils>
 [[6]] $ <env: package:datasets>
 [[7]] $ <env: package:methods>
 [[8]] $ <env: Autoloads>
 [[9]] $ <env: tools:callr>
[[10]] $ <env: package:base>

swiss lies in datasets, so found after 5 fails

Loading a package

callr::r(function() {
  library(forcats) 
  rlang::search_envs()
})
 [[1]] $ <env: global>
 [[2]] $ <env: package:forcats>
 [[3]] $ <env: package:stats>
 [[4]] $ <env: package:graphics>
 [[5]] $ <env: package:grDevices>
 [[6]] $ <env: package:utils>
 [[7]] $ <env: package:datasets>
 [[8]] $ <env: package:methods>
 [[9]] $ <env: Autoloads>
[[10]] $ <env: tools:callr>
[[11]] $ <env: package:base>

The latest loaded package masks other namespaces

This is why dplyr::filter masks stats::filter when loading the tidyverse.

Quoting prevents evaluation, nice, but how does it work?

Education is unknown in the Global Environment

Education
Error in eval(expr, envir, enclos): object 'Education' not found

Quoting prevents evaluation

quote(Education)
Education

But how to force evaluation then?

uses eval() in data context with envir

Education <- "tic"
# the Education in the GlobalEnv won't clash 
eval(expr = quote(Education), envir = swiss)
 [1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
[26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29
# Works also with `with`
with(swiss, Education)
 [1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
[26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29

Evaluation in the tidyverse

Education is evaluated in the swiss context only

Education <- 20
filter(swiss, Education > 40)
             Fertility Agriculture Examination Education Catholic
V. De Geneve        35         1.2          37        53    42.34
             Infant.Mortality
V. De Geneve               18

Works as expected!

Evaluation works by context even if an object name is colliding with the Global Environment!

But you better avoid names collisions for your own sanity!.

Pronouns for disambiguation

.data and .env pronouns exist if you need to precise who is where

filter(swiss, .data$Education > .env$Education)
             Fertility Agriculture Examination Education Catholic
Lausanne          55.7        19.4          26        28    12.11
Neuchatel         64.4        17.6          35        32    16.92
V. De Geneve      35.0         1.2          37        53    42.34
Rive Droite       44.7        46.6          16        29    50.43
Rive Gauche       42.8        27.7          22        29    58.33
             Infant.Mortality
Lausanne                 20.2
Neuchatel                23.0
V. De Geneve             18.0
Rive Droite              18.2
Rive Gauche              19.3

Quotations of expressions

Doing it by hand

expr <- quote((Education + Examination) > 60)
expr
(Education + Examination) > 60
swiss[eval(expr, envir = swiss), 1:2]
             Fertility Agriculture
Neuchatel         64.4        17.6
V. De Geneve      35.0         1.2

Fails in a function

quoted_filter <- function(data, expr) {
  q_expr <- quote(expr)
  data[eval(q_expr, envir = data), 1:2]
}
quoted_filter(swiss, (Education + Examination) > 60)
Error in eval(expr, envir, enclos): object 'Examination' not found

Expressions in functions

substitute() is needed

quoted_filter_sub <- function(data, expr) {
  q_expr <- substitute(expr)
  # returns both the results and substituted expression
  list(data[eval(q_expr, envir = data), 1:2],
       q_expr
  )
}
quoted_filter_sub(swiss, (Education + Examination) > 60)
[[1]]
             Fertility Agriculture
Neuchatel         64.4        17.6
V. De Geneve      35.0         1.2

[[2]]
(Education + Examination) > 60

Substituting and Quoting Expressions help page

substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env. quote simply returns its argument. The argument is not evaluated and can be any R expression.

  • Complex but works, so why do we need tidyeval?

Why do we need tidyeval, i. e quasiquotation?

Add one variable in GlobalEnv

threshold <- 60
quoted_filter_sub(swiss, 
                  (Education + 
                     Examination) > threshold)
[[1]]
             Fertility Agriculture
Neuchatel         64.4        17.6
V. De Geneve      35.0         1.2

[[2]]
(Education + Examination) > threshold

Error, names clash

Fertility is also a data column

Fertility <- 60
quoted_filter_sub(swiss, 
                  (Education + 
                     Examination) > Fertility)
[[1]]
             Fertility Agriculture
Neuchatel         64.4        17.6
V. De Geneve      35.0         1.2
Rive Droite       44.7        46.6
Rive Gauche       42.8        27.7

[[2]]
(Education + Examination) > Fertility

Quasiquotation, bang bang ‼️ operator

  • When we need to unquote part of the expression: [!!]

  • rlang::qq_show() helper to check

Demonstration

quo((Education + Examination) > Fertility)
<quosure>
expr: ^(Education + Examination) > Fertility
env:  global
rlang::qq_show(quo((Education + Examination) > !! Fertility))
quo((Education + Examination) > 60)

The bang bang operator !!

In dplyr, unquote only Fertility

  • Keep the rest of the expression as before
filter(swiss, (Education + Examination) > !! Fertility)
             Fertility Agriculture Examination Education Catholic
Neuchatel         64.4        17.6          35        32    16.92
V. De Geneve      35.0         1.2          37        53    42.34
             Infant.Mortality
Neuchatel                  23
V. De Geneve               18

New, the curly-curly operator

filter(swiss, (Education + Examination) > {{Fertility}})
             Fertility Agriculture Examination Education Catholic
Neuchatel         64.4        17.6          35        32    16.92
V. De Geneve      35.0         1.2          37        53    42.34
             Infant.Mortality
Neuchatel                  23
V. De Geneve               18

Tip

rlang is providing the toolkit

  • Really advanced
  • Most of the time, you don’t need it
  • Already exposed in main tidyverse packages
  • Animation done in with rayshader by Brodie Gaslam

Use unquoted names in your own functions

  • Fertility as a promise, i.e. unknown in the Global Env)
  • Evaluation is delayed by enquo()
  • User decide the evaluation with !!
  • Both can be abstracted with {{}}
select_head <- function(.data, column, n = 5) {
  .data |>
    select({{column}}) |> 
    slice_head(n = n)
}
select_head(swiss, column = Fertility, n = 2)
           Fertility
Courtelary      80.2
Delemont        83.1

What to remember tidyeval

select_head(swiss, column = Fertility, n = 2)
           Fertility
Courtelary      80.2
Delemont        83.1
select_head(swiss, column = c(Fertility, Catholic), n = 2)
           Fertility Catholic
Courtelary      80.2     9.96
Delemont        83.1    84.84
select_head(swiss, column = Fertility:Examination, n = 3)
             Fertility Agriculture Examination
Courtelary        80.2        17.0          15
Delemont          83.1        45.1           6
Franches-Mnt      92.5        39.7           5

All you need to remember

Unquoted arguments in functions: use {{arg}}

New column name, the operator :=

  • To turn a quosure into a name that could be pasted
  • Must uses a specific assignment := (walrus, from data.table)
my_summarise <- function(.data, out_name, expr) {

  summarise(.data, 
    {{out_name}} := mean({{expr}})
  )
}
my_summarise(swiss, m_fer, Fertility)
     m_fer
1 70.14255
my_summarise(swiss, mean_exam, Examination)
  mean_exam
1  16.48936

Before we stop

You learned to:

  • Grasp non-standard evaluation
  • Use it with tidyeval
  • Pass arguments as promises, not strings

I can categorically say if you’re pasting strings to program with dplyr, there is always better way.

Hadley Wickham

Acknowledgments

  • Lionel Henry
  • Jenny Bryan
  • Brodie Gaslam
  • Hadley Wickham

Further reading

Thank you for your attention!