What would standard evaluation pipes look like?

2019-09-20

Background: Symbol pushing in R

R’s use of symbols/names as computational objects

It is prevalent

$ and named element parameters
lm() and glm() subset parameter
formulas

R’s use of symbols/names as computational objects

It is prevalent

$ and named element parameters
lm() and glm() subset parameter
formulas

We have nice utilities for dealing with symbols and expressions

substitute, quote, enquote, bquote
codetools package

R as a Language for Creating DSD’s

A dialect takes a subset of all R expressions

taken as a function argument (won’t work otherwise)
modifies the expression
then executes the new expression

R as a Language for Creating DSD’s

A dialect takes a subset of all R expressions

taken as a function argument (won’t work otherwise)
modifies the expression
then executes the new expression

This could be taken a lot further with a parsing package

any surjective syntax could be implemented
we probably don’t want to do this

It is already being done

drawing

DSD Input Functions

formula handling in lme4 by Doug Bates
tidyverse %>% by Stefan Milton Bache and Hadley Wickham
data.table [ and [<- infix operators by Matt Dowle
armacmp by Dirk Schumacher
formula handling in survival (mult-state models) by Terry Therneau
… and others

The context/brevity tradeoff

iris[order(iris[,"Sepal.Width"] )[1:3], "Sepal.Width", drop = FALSE]

##    Sepal.Width
## 61         2.0
## 63         2.2
## 69         2.2

library(dplyr)

iris %>% select(Sepal.Width) %>% arrange(Sepal.Width) %>% head(3)

##   Sepal.Width
## 1         2.0
## 2         2.2
## 3         2.2

A pipe that uses standard evaluation

What is `magrittr` doing?

foo <- . %>% head %>% tail(n=5)
foo

## Functional sequence with the following components:
## 
##  1. head(.)
##  2. tail(., n = 5)
## 
## Use 'functions' to extract the individual functions.

unclass(foo)

## function (value) 
## freduce(value, `_function_list`)
## <environment: 0x7fd9025b3d18>

What is `magrittr` doing?

ls(environment(foo))

## [1] "_fseq"          "_function_list" "freduce"

What is `magrittr` doing?

environment(foo)$`_function_list`

## [[1]]
## function (.) 
## head(.)
## 
## [[2]]
## function (.) 
## tail(., n = 5)

What is `magrittr` doing?

environment(foo)$`freduce`

## function (value, function_list) 
## {
##     k <- length(function_list)
##     if (k > 1) {
##         for (i in 1:(k - 1L)) {
##             value <- function_list[[i]](value)
##         }
##     }
##     value <- withVisible(function_list[[k]](value))
##     if (value[["visible"]]) 
##         value[["value"]]
##     else invisible(value[["value"]])
## }
## <bytecode: 0x7fd90556eb30>
## <environment: namespace:magrittr>

Pipes do 2.5 things

Partial function evaluation

Function composition

2.5. Generalized function composition

Back to our example

foo <- . %>% head() %>% tail(n = 5)
 
# is equivalent to...

foo <- function(x) {
  tail(head(x), n=5)
}

Why might we prefer the latter

We get a regular, readable, stack-traceable R function

It’s easier for bytecode interpreter to optimize

f(g(x)) rather than g(x) and then f() of the return

The `fc` package and function

library(fc) # devtools::install_github("swang87/fc")

fc(tail, x = head(x), n = 5)

## function (x) 
## {
##     tail(x = head(x), n = 5)
## }

A pipe is (almost) a special case of `fc()` functions

We can’t do the following:

iris %>% head() %>% tail(n=5)

but we can do

( head() %>% fc(tail, n=5) )(iris)

( head() %>% fc(tail, n=5) )(iris)

A (slightly) more complex example

# magrittr
. %>% head(n=50) %>% summary()

# fc
fc(summary, object=fc(head, n = 50)(object))

## function (x) 
## {
##     summary(object = internal_anon_func(x))
## }
## <environment: 0x7fd908c90618>

Is the tidyverse suffering from domain creep?

My use of the tidyverse

  # Create the longitudinal data set.
  lupus_longitudinal = lupus_clean %>%
    mutate(outcome = future_map(qs, make_outcome)) %>%
    mutate(visits = future_map_int(outcome, nrow)) %>%
    filter(visits > 2) %>%
    mutate(last_visit = 
             future_map_dbl(outcome, ~ .x$qsdy[nrow(.x)])) %>%
    mutate(sri_response = 
             future_map_lgl(outcome, make_sri_response)) %>%
    mutate(bilag_ss = 
             future_map(outcome, bilag_score_summary)) %>%
    select(usubjid, outcome, bilag_ss) %>%
    unnest() %>%
    select(usubjid, sledai, qsdy, pga, starts_with("bilag")) %>%
    mutate(sledai = as.numeric(as.character(sledai)),
           pga = as.numeric(as.character(pga)))

Why does it work?

magrittr, dplyr, and friends work best for structured data where there is a sequence of prespecified operations.

\[ \text{standardized} \rlap{\ \ \ \not}\iff \text{tidy} \]

\[ \text{properly abstracted} \rlap{\ \ \ \not}\iff \text{tidy} \]

Where won’t it work?

update_beta <- function(X, y, lambda, alpha, b, W) {
  WX <- W * X
  WX2 <- W * X^2
  Xb <- X %*% b
  for (i in seq_along(b)) {
    Xb <- Xb - X[, i] * b[i]
    b[i] <- soft_thresh(sum(WX[,i, drop=FALSE] * (y - Xb)),
                        lambda*alpha)
    b[i] <- b[i] / (sum(WX2[, i]) + lambda * (1 - alpha))
    Xb <- Xb + X[, i] * b[i]
  }
  b
}

Can we do better than a descriminative characterization of a domain?

Thanks!

References

Github repository https://github.com/swang87/fc

Background: Symbol pushing in R

R’s use of symbols/names as computational objects

R’s use of symbols/names as computational objects

R as a Language for Creating DSD’s

R as a Language for Creating DSD’s

It is already being done

DSD Input Functions

The context/brevity tradeoff

A pipe that uses standard evaluation

What is magrittr doing?

What is magrittr doing?

What is magrittr doing?

What is magrittr doing?

Pipes do 2.5 things

Back to our example

Why might we prefer the latter

The fc package and function

A pipe is (almost) a special case of fc() functions

A (slightly) more complex example

Is the tidyverse suffering from domain creep?

My use of the tidyverse

Why does it work?

Where won’t it work?

Can we do better than a descriminative characterization of a domain?

Thanks!

References

What is `magrittr` doing?

What is `magrittr` doing?

What is `magrittr` doing?

What is `magrittr` doing?

The `fc` package and function

A pipe is (almost) a special case of `fc()` functions