Graph of interaction in ggplot2

Posted on

Question :

I’m trying to adapt some of the standard R graphics to the ggplot2 style. One of the charts I want to do this is the interaction graph in a linear model fit study.

The following data were taken from Example 9-1 in Douglas C. Montgomery’s book Design and Analysis of Experiments, 6th Edition.

montgomery <- structure(list(Nozzle = c("A1", "A1", "A1", "A1", "A1", "A1", 
"A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", 
"A1", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", 
"A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A3", "A3", "A3", 
"A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", 
"A3", "A3", "A3", "A3"), Speed = c("B1", "B1", "B1", "B1", "B1", 
"B1", "B2", "B2", "B2", "B2", "B2", "B2", "B3", "B3", "B3", "B3", 
"B3", "B3", "B1", "B1", "B1", "B1", "B1", "B1", "B2", "B2", "B2", 
"B2", "B2", "B2", "B3", "B3", "B3", "B3", "B3", "B3", "B1", "B1", 
"B1", "B1", "B1", "B1", "B2", "B2", "B2", "B2", "B2", "B2", "B3", 
"B3", "B3", "B3", "B3", "B3"), Pressure = c("C1", "C1", "C2", 
"C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", "C1", "C1", 
"C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", "C1", 
"C1", "C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", 
"C1", "C1", "C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", 
"C3", "C1", "C1", "C2", "C2", "C3", "C3"), Loss = c(-35, -25, 
110, 75, 4, 5, -45, -60, -10, 30, -40, -30, -40, 15, 80, 54, 
31, 36, 17, 24, 55, 120, -23, -5, -65, -58, -55, -44, -64, -62, 
20, 4, 110, 44, -20, -31, -39, -35, 90, 113, -30, -55, -55, -67, 
-28, -26, -62, -52, 15, -30, 110, 135, 54, 4)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -54L), .Names = c("Nozzle", 
"Speed", "Pressure", "Loss"))

According to the traditional way of creating the chart I want, I need to run

interaction.plot(montgomery$Nozzle, montgomery$Speed, montgomery$Loss)



What I want now is to create a function called interaction.plot.ggplot2 that automatically makes the previous graphic. The problem is that I do not know how to call the columns for the dplyr commands to prepare the data to be plotted.

interaction.plot.ggplot2 <- function(response, predictor, group, data){

    interaction <- data %>%
      select(predictor, group, response) %>%
      group_by(predictor, group) %>%
      summarise(average = mean(response))

    p <- ggplot(interaction, aes(x=predictor, y=average, colour=group, group=group)) + 


interaction.plot.ggplot2(Loss, Nozzle, Speed, montgomery)

Error in eval(expr, envir, enclos) : object 'Nozzle' not found

What should I do to make my interaction.plot.ggplot2 function create the chart I want?


Answer :

Making programs where variables vary with dplyr and ggplot can be very annoying.

Here’s a function that works for what you want:


interaction.plot.ggplot2 <- function(response, predictor, group, data){

  l_response <- lazy(response)
  l_predictor <- lazy(predictor)
  l_group <- lazy(group)

  interaction <- data %>%
    select_(.dots = list(l_predictor, l_group, l_response)) %>%
     group_by_(.dots = list(l_predictor, l_group)) %>%
       .dots = setNames(list(interp(~mean(response), response = l_response)), "average")

  p <- ggplot(interaction, aes_string(x=expr_text(predictor), y="average", colour=expr_text(group), group=expr_text(group))) +


interaction.plot.ggplot2(Loss, Nozzle, Speed, montgomery)


To accept variable names without quotation marks ex: select(data, nome_var) , dplyr uses what is called lazy evaluation or non-standard evaluation . It is so called because usually R calculates / evaluates the arguments of functions before using them within the function.

For example:

myfun <- function(x){
myfun(x = 1 + 1)
[1] 2

lazy-evaluation is a way to delay the evaluation of the argument, in order to make it possible to capture the expression that the user typed as the function’s argument.

myfun <- function(x){
myfun(x = 1 + 1)
  expr: 1 + 1
  env:  <environment: R_GlobalEnv>

This way of programming enables non-standard scoping which is very useful for making programs for analyzing data interactively. The trade-off is the complexity of the code when the analysis is not interactive (eg your problem).

I leave here the relevant part of the lazyeval approach:


Non-standard scoping (NSS) is an important part of R because it makes
  it easy to write functions tailored for interactive data exploration.
  These functions require less typing, at the cost of some ambiguity and
  “Magic”. This is a good trade-off for interactive data exploration
  because you want to get ideas out of your head and into the computer
  as quickly as possible. If a function does not make a bad guess, you’ll
  spot it quickly because you’re interactively working.

I stress that to understand well, it’s worth reading this document


Leave a Reply

Your email address will not be published. Required fields are marked *