2019-09-20

Background: Symbol pushing in R

R’s use of symbols/names as computational objects

It is prevalent

  • $ and named element parameters
  • lm() and glm() subset parameter
  • formulas

R’s use of symbols/names as computational objects

It is prevalent

  • $ and named element parameters
  • lm() and glm() subset parameter
  • formulas

We have nice utilities for dealing with symbols and expressions

  • substitute, quote, enquote, bquote
  • codetools package

R as a Language for Creating DSD’s

A dialect takes a subset of all R expressions

  • taken as a function argument (won’t work otherwise)
  • modifies the expression
  • then executes the new expression

R as a Language for Creating DSD’s

A dialect takes a subset of all R expressions

  • taken as a function argument (won’t work otherwise)
  • modifies the expression
  • then executes the new expression

This could be taken a lot further with a parsing package

  • any surjective syntax could be implemented
  • we probably don’t want to do this

It is already being done

drawing

DSD Input Functions

  • formula handling in lme4 by Doug Bates
  • tidyverse %>% by Stefan Milton Bache and Hadley Wickham
  • data.table [ and [<- infix operators by Matt Dowle
  • armacmp by Dirk Schumacher
  • formula handling in survival (mult-state models) by Terry Therneau
  • … and others

The context/brevity tradeoff

iris[order(iris[,"Sepal.Width"] )[1:3], "Sepal.Width", drop = FALSE]
##    Sepal.Width
## 61         2.0
## 63         2.2
## 69         2.2
library(dplyr)

iris %>% select(Sepal.Width) %>% arrange(Sepal.Width) %>% head(3)
##   Sepal.Width
## 1         2.0
## 2         2.2
## 3         2.2

A pipe that uses standard evaluation

What is magrittr doing?

foo <- . %>% head %>% tail(n=5)
foo
## Functional sequence with the following components:
## 
##  1. head(.)
##  2. tail(., n = 5)
## 
## Use 'functions' to extract the individual functions.
unclass(foo)
## function (value) 
## freduce(value, `_function_list`)
## <environment: 0x7fd9025b3d18>

What is magrittr doing?

ls(environment(foo))
## [1] "_fseq"          "_function_list" "freduce"

What is magrittr doing?

environment(foo)$`_function_list`
## [[1]]
## function (.) 
## head(.)
## 
## [[2]]
## function (.) 
## tail(., n = 5)

What is magrittr doing?

environment(foo)$`freduce`
## function (value, function_list) 
## {
##     k <- length(function_list)
##     if (k > 1) {
##         for (i in 1:(k - 1L)) {
##             value <- function_list[[i]](value)
##         }
##     }
##     value <- withVisible(function_list[[k]](value))
##     if (value[["visible"]]) 
##         value[["value"]]
##     else invisible(value[["value"]])
## }
## <bytecode: 0x7fd90556eb30>
## <environment: namespace:magrittr>

Pipes do 2.5 things

  1. Partial function evaluation
  1. Function composition

2.5. Generalized function composition

Back to our example

foo <- . %>% head() %>% tail(n = 5)
 
# is equivalent to...

foo <- function(x) {
  tail(head(x), n=5)
}

Why might we prefer the latter

  • We get a regular, readable, stack-traceable R function
  • It’s easier for bytecode interpreter to optimize
  • f(g(x)) rather than g(x) and then f() of the return

The fc package and function

library(fc) # devtools::install_github("swang87/fc")

fc(tail, x = head(x), n = 5)
## function (x) 
## {
##     tail(x = head(x), n = 5)
## }

A pipe is (almost) a special case of fc() functions

We can’t do the following:

iris %>% head() %>% tail(n=5)

but we can do

( head() %>% fc(tail, n=5) )(iris)

or

( head() %>% fc(tail, n=5) )(iris)

A (slightly) more complex example

# magrittr
. %>% head(n=50) %>% summary()
# fc
fc(summary, object=fc(head, n = 50)(object))
## function (x) 
## {
##     summary(object = internal_anon_func(x))
## }
## <environment: 0x7fd908c90618>

Is the tidyverse suffering from domain creep?

My use of the tidyverse

  # Create the longitudinal data set.
  lupus_longitudinal = lupus_clean %>%
    mutate(outcome = future_map(qs, make_outcome)) %>%
    mutate(visits = future_map_int(outcome, nrow)) %>%
    filter(visits > 2) %>%
    mutate(last_visit = 
             future_map_dbl(outcome, ~ .x$qsdy[nrow(.x)])) %>%
    mutate(sri_response = 
             future_map_lgl(outcome, make_sri_response)) %>%
    mutate(bilag_ss = 
             future_map(outcome, bilag_score_summary)) %>%
    select(usubjid, outcome, bilag_ss) %>%
    unnest() %>%
    select(usubjid, sledai, qsdy, pga, starts_with("bilag")) %>%
    mutate(sledai = as.numeric(as.character(sledai)),
           pga = as.numeric(as.character(pga)))

Why does it work?

magrittr, dplyr, and friends work best for structured data where there is a sequence of prespecified operations.



\[ \text{standardized} \rlap{\ \ \ \not}\iff \text{tidy} \]

\[ \text{properly abstracted} \rlap{\ \ \ \not}\iff \text{tidy} \]

Where won’t it work?

update_beta <- function(X, y, lambda, alpha, b, W) {
  WX <- W * X
  WX2 <- W * X^2
  Xb <- X %*% b
  for (i in seq_along(b)) {
    Xb <- Xb - X[, i] * b[i]
    b[i] <- soft_thresh(sum(WX[,i, drop=FALSE] * (y - Xb)),
                        lambda*alpha)
    b[i] <- b[i] / (sum(WX2[, i]) + lambda * (1 - alpha))
    Xb <- Xb + X[, i] * b[i]
  }
  b
}

Can we do better than a descriminative characterization of a domain?

Thanks!

References