Question :
I’m trying to adapt some of the standard R graphics to the ggplot2
style. One of the charts I want to do this is the interaction graph in a linear model fit study.
The following data were taken from Example 9-1 in Douglas C. Montgomery’s book Design and Analysis of Experiments, 6th Edition.
montgomery <- structure(list(Nozzle = c("A1", "A1", "A1", "A1", "A1", "A1",
"A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1",
"A1", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2",
"A2", "A2", "A2", "A2", "A2", "A2", "A2", "A2", "A3", "A3", "A3",
"A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3", "A3",
"A3", "A3", "A3", "A3"), Speed = c("B1", "B1", "B1", "B1", "B1",
"B1", "B2", "B2", "B2", "B2", "B2", "B2", "B3", "B3", "B3", "B3",
"B3", "B3", "B1", "B1", "B1", "B1", "B1", "B1", "B2", "B2", "B2",
"B2", "B2", "B2", "B3", "B3", "B3", "B3", "B3", "B3", "B1", "B1",
"B1", "B1", "B1", "B1", "B2", "B2", "B2", "B2", "B2", "B2", "B3",
"B3", "B3", "B3", "B3", "B3"), Pressure = c("C1", "C1", "C2",
"C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", "C1", "C1",
"C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3", "C1",
"C1", "C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3", "C3",
"C1", "C1", "C2", "C2", "C3", "C3", "C1", "C1", "C2", "C2", "C3",
"C3", "C1", "C1", "C2", "C2", "C3", "C3"), Loss = c(-35, -25,
110, 75, 4, 5, -45, -60, -10, 30, -40, -30, -40, 15, 80, 54,
31, 36, 17, 24, 55, 120, -23, -5, -65, -58, -55, -44, -64, -62,
20, 4, 110, 44, -20, -31, -39, -35, 90, 113, -30, -55, -55, -67,
-28, -26, -62, -52, 15, -30, 110, 135, 54, 4)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -54L), .Names = c("Nozzle",
"Speed", "Pressure", "Loss"))
According to the traditional way of creating the chart I want, I need to run
interaction.plot(montgomery$Nozzle, montgomery$Speed, montgomery$Loss)
Icancreateasimilargraphusingggplot2
:
library(dplyr)library(ggplot2)interaction<-montgomery%>%select(Nozzle,Speed,Loss)%>%group_by(Nozzle,Speed)%>%summarise(Average=mean(Loss))ggplot(interaction,aes(x=Nozzle,y=Average,colour=Speed,group=Speed))+geom_line()
What I want now is to create a function called interaction.plot.ggplot2
that automatically makes the previous graphic. The problem is that I do not know how to call the columns for the dplyr
commands to prepare the data to be plotted.
interaction.plot.ggplot2 <- function(response, predictor, group, data){
interaction <- data %>%
select(predictor, group, response) %>%
group_by(predictor, group) %>%
summarise(average = mean(response))
p <- ggplot(interaction, aes(x=predictor, y=average, colour=group, group=group)) +
geom_line()
print(p)
}
interaction.plot.ggplot2(Loss, Nozzle, Speed, montgomery)
Error in eval(expr, envir, enclos) : object 'Nozzle' not found
What should I do to make my interaction.plot.ggplot2
function create the chart I want?
Answer :
Making programs where variables vary with dplyr
and ggplot
can be very annoying.
Here’s a function that works for what you want:
library(dplyr)
library(ggplot2)
library(lazyeval)
interaction.plot.ggplot2 <- function(response, predictor, group, data){
l_response <- lazy(response)
l_predictor <- lazy(predictor)
l_group <- lazy(group)
interaction <- data %>%
select_(.dots = list(l_predictor, l_group, l_response)) %>%
group_by_(.dots = list(l_predictor, l_group)) %>%
summarise_(
.dots = setNames(list(interp(~mean(response), response = l_response)), "average")
)
p <- ggplot(interaction, aes_string(x=expr_text(predictor), y="average", colour=expr_text(group), group=expr_text(group))) +
geom_line()
print(p)
}
interaction.plot.ggplot2(Loss, Nozzle, Speed, montgomery)
Allthisisreasonablywelldescribedintheselinks:
- Non-standard evaluation
- Lazyeval
To accept variable names without quotation marks ex: select(data, nome_var)
, dplyr uses what is called lazy evaluation or non-standard evaluation . It is so called because usually R calculates / evaluates the arguments of functions before using them within the function.
For example:
myfun <- function(x){
return(x)
}
myfun(x = 1 + 1)
[1] 2
lazy-evaluation is a way to delay the evaluation of the argument, in order to make it possible to capture the expression that the user typed as the function’s argument.
myfun <- function(x){
return(lazy(x))
}
myfun(x = 1 + 1)
<lazy>
expr: 1 + 1
env: <environment: R_GlobalEnv>
This way of programming enables non-standard scoping which is very useful for making programs for analyzing data interactively. The trade-off is the complexity of the code when the analysis is not interactive (eg your problem).
I leave here the relevant part of the lazyeval approach:
Non-standard scoping (NSS) is an important part of R because it makes
it easy to write functions tailored for interactive data exploration.
These functions require less typing, at the cost of some ambiguity and
“Magic”. This is a good trade-off for interactive data exploration
because you want to get ideas out of your head and into the computer
as quickly as possible. If a function does not make a bad guess, you’ll
spot it quickly because you’re interactively working.
I stress that to understand well, it’s worth reading this document