# How to make dates difference in R?

Posted on

#### Question :

``````dataset <- structure(list(PLACA = structure(c(5L, 5L, 5L, 4L, 1L, 2L, 3L,
7L, 6L, 8L), .Label = c("DSF9652", "EFR9618", "EQW6597", "ERB1522",
"EWM3539", "LOC1949", "LQQ5554", "OQT5917"), class = "factor"),
COD_REV = c(113195L, 113196L, 113197L, 113303L, 80719L, 80720L,
80722L, 113318L, 80788L, 113386L), DATA = structure(1:10, .Label = c("2016-01-14 12:13:00.000",
"2016-01-18 18:48:00.000", "2016-01-18 19:00:00.000", "2016-01-25 11:46:00.000",
"2016-01-25 19:20:00.000", "2016-01-25 19:28:00.000", "2016-01-25 19:33:00.000",
"2016-01-25 20:56:00.000", "2016-01-26 21:28:00.000", "2016-01-27 13:50:00.000"
), class = "factor"), KM_ATUAL = c(52100L, 52100L, 52100L,
110676L, 62300L, 31144L, 165022L, 41021L, 155646L, 55030L
), KM_MEDIA = c(0L, 42L, 40L, 20L, 17L, 18L, 120L, 100L,
10L, 38L)), .Names = c("PLACA", "COD_REV", "DATA", "KM_ATUAL",
"KM_MEDIA"), row.names = c(NA, -10L), class = "data.frame")
``````

I have the above dataset and would like to group the boards to see how many visits the same customer did. So I need to calculate the difference between the dates and km_atual of visits, to compare with the field KM_media_dia and see the difference between these values.
I can not figure out the difference between the dates. This was my attempt so far:

``````library(tidyverse)
# Carregando os datasets

dataset_revisao_km <- dataset %>%
# selecionar apenas colunas importantes
select(CPF, PLACA, COD_REV, DATA, KM_ATUAL) %>%
arrange(DATA) %>%
group_by(PLACA) %>%
mutate(ORDEM_REVISAO = row_number()) %>%
# manter apenas placas com mais de uma revisao
filter(n() > 1) %>%
mutate(DIFERENCA_KM = KM_ATUAL - lag(KM_ATUAL)) %>%
# filtrar fora a primeira revisao da placa
filter(ORDEM_REVISAO > 1)
``````

The `R` requires data with dates to be correctly specified so that it can perform calculations that may be required. One of the best ways to do this is with the `lubridate` package:

``````library(lubridate)

dataset\$DATA <- ymd_hms(dataset\$DATA)
``````

Note that I just replaced the `DATA` column with its equivalent in `ymd_hms` (YearMonthDay_HourMinuteSecond), as it was in the original dataset. From there it was only to calculate the difference in days between the equal plates, using the function `difftime` :

``````dataset %>%
group_by(PLACA) %>%
filter(n() > 1) %>%
na.omit()
# A tibble: 2 x 6
# Groups:   PLACA [1]
PLACA   COD_REV DATA                KM_ATUAL KM_MEDIA Diferenca
<fct>     <int> <dttm>                 <int>    <int> <time>
1 EWM3539  113196 2016-01-18 18:48:00    52100       42 4.27430555555556
2 EWM3539  113197 2016-01-18 19:00:00    52100       40 0.00833333333333333
``````

Note that in the data set only the EWM3539 board appears more than once. As she appears 3 times, it does not make sense to speak on her first visit, since there is no difference of days. Therefore, we remove this information through `na.omit` .

Posted in qnaTagged