Question :
This is my dataframe:
structure(list(Year = c(1979L, 1979L, 1979L, 1980L, 1980L, 1980L,
1981L, 1981L, 1981L, 1982L, 1982L, 1982L, 1983L, 1983L, 1983L,
1984L, 1984L, 1984L, 1985L, 1985L, 1985L, 1986L, 1986L, 1986L,
1987L, 1987L, 1987L, 1988L, 1988L, 1988L), Month = c(10L, 11L,
12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L,
10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L,
11L, 12L), Y.1 = c(8.00983263923528, 2.41267858341867, -0.701122343112104,
-3.93438481559836, 1.61989462202274, -0.0837521649979607, -1.18856075379809,
-5.79109166398385, -6.02656788564288, 3.57285443621284, 5.28086890954826,
4.61968948421691, 1.6450358083769, 2.09679639676383, 3.13330926488653,
7.03433470051535, 8.82984898471047, 6.35665464823924, -2.06916023327692,
-6.80818412035661, -2.55840141236052, 5.93892137387166, 3.73139295521127,
-2.43756307587375, -7.88332536927916, -11.1612368255376, -14.9073451470428,
-3.39210451580797, -9.45264055248482, -6.71777033430725), X.1 = c(0.308656857874223,
1.04586629806642, 0.861945545932596, 0.375970358978561, -0.347308458564966,
-0.29159098146565, 0.658969566870815, 0.777325096646653, 0.819638059706351,
0.14348380776068, 0.320980128297688, 0.422457840273038, 0.0753279027397413,
-0.00412826834750302, -0.0306969460488249, 0.202590024491522,
0.144588970489035, 0.299274727728394, 0.924086583854944, 0.903017497665926,
0.964001122879932, 1.26678884737668, 1.24568369535494, 1.17738738727233,
0.855877205956479, 0.778924677659654, 0.601219806786069, 0.967781164852632,
1.10343758488876, 1.02401236754546), Y.2 = c("NA", "NA", "NA",
"5.33565675549722", "-0.477469962261498", "0.743881752912509",
"0.946947439972276", "5.26357788348063", "6.20317011981397",
"-3.44416166730468", "-4.98209173294852", "-4.17799392953961",
"-1.60319913629998", "-2.07841411022162", "-3.07277915798255",
"-6.81314462908097", "-8.99190729955144", "-6.41231440381122",
"2.93695557772259", "7.71262044640592", "3.48797284502131", "-5.06072963216373",
"-2.74288427337241", "3.50049327959275", "8.56226731314113",
"12.0144762810381", "15.6527185635863", "4.17084966096979", "10.4311905060596",
"7.6861205071862"), X.2 = c(0.288003451, 0.873662015, 0.874190316,
0.36027826, -0.120926336, -0.276130722, 0.633675698, 0.849582846,
0.778756432, 0.20203225, 0.221280623, 0.467109312, 0.07783831,
-0.008749708, -0.023401276, 0.196393036, 0.18439037, 0.294919158,
0.908446718, 0.922729322, 0.962361556, 0.74, 0.74, 0.77, 2.36,
2.79, 1.76, 1.26, 1.48, 1.21)), class = "data.frame", row.names = c(NA,
-30L))
When I run the following equation, because there are some NA’s, the R makes the adjustment to delete the first 3 lines of Y.1 and the first 3 lines of X.1. It should delete the last 3 lines of X.1 :
summary(volcker.ini %>% lm(Y.1~X.1,data = .))
How can I make this adjustment in the above code?
Answer :
There must be something non-standard with your session R.
As can be read in help("lm")
, in section Arguments
(my emphasis):
na.action
a function which indicates what should happen when the data contain NAs.
The default is set by the na.action setting of options, and is na.fail if
that is unset. The ‘factory-fresh’ default isna.omit
. Another possible
value is NULL, no action. Valuena.exclude
can be useful.
This means that the lm
command will omit the NA
values unless you modify the options()$na.action
value. This value can be checked with
options()$na.action
#[1] "na.omit"
If you do something else, just run the following command.
options(na.action = "na.omit")
In my system that’s the value, I never change it. And when I ran your code, everything worked out.
library(dplyr)
summary(volcker.ini %>% lm(Y.1 ~ X.1,data = .))
#
#Call:
#lm(formula = Y.1 ~ X.1, data = .)
#
#Residuals:
# Min 1Q Median 3Q Max
#-14.1342 -4.0814 0.0258 4.5236 10.2769
#
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.447 1.675 1.461 0.1552
#X.1 -5.356 2.259 -2.371 0.0249 *
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 5.613 on 28 degrees of freedom
#Multiple R-squared: 0.1672, Adjusted R-squared: 0.1375
#F-statistic: 5.621 on 1 and 28 DF, p-value: 0.02486
In the above quote it says that na.exclude
can be useful. See its help("na.exclude")
page, and if you find it to be really useful, the modified code will be
summary(volcker.ini %>% lm(Y.1 ~ X.1,data = ., na.action = na.exclude))
And now, why not divide this statement into two, one to assign the value of lm
and another to summary
?
modelo <- volcker.ini %>% lm(Y.1 ~ X.1,data = ., na.action = na.exclude)
summary(modelo)
Later you may want coef(modelo)
or other values as waste.
Finally, so that it does not happen again, see if you have a file named .RData
(this is not an extension, it is the full name of the file) and if you have it remove it.