Question :
I’m trying to do the scrape of the Ministry of Labor mediator system. Basically, I want the relation of collective agreements and conventions:
url1<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo"
Once I access this page, I arrive at the search form. I have chosen only to select the validity: “All” and the registration UF: “IF”
Clicking, I have access to XHR:
url2<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo/getConsultaAvancada"
And to the body:
str(body)
List of 27
$ nrCnpj : chr ""
$ nrCei : chr ""
$ noRazaoSocial : chr ""
$ dsCategoria : chr ""
$ tpRequerimento : chr "acordo"
$ tpRequerimento : chr "acordoColetivoEspecificoPPE"
$ tpRequerimento : chr "acordoColetivoEspecificoDomingosFeriados"
$ tpRequerimento : chr "convencao"
$ tpRequerimento : chr "termoAditivoAcordo"
$ tpRequerimento : chr "termoAditivoConvecao"
$ tpRequerimento : chr "termoAditivoAcordoEspecificoPPE"
$ tpRequerimento : chr "termoAditivoAcordoEspecificoDomingoFeriado"
$ tpVigencia : chr "2"
$ sgUfDeRegistro : chr "SE"
$ dtInicioRegistro : chr ""
$ dtFimRegistro : chr ""
$ dtInicioVigenciaInstrumentoColetivo: chr ""
$ dtFimVigenciaInstrumentoColetivo : chr ""
$ tpAbrangencia : chr "Todos os tipos"
$ ufsAbrangidasTotalmente : chr "SE"
$ cdMunicipiosAbrangidos : chr ""
$ cdGrupo : chr ""
$ cdSubGrupo : chr ""
$ noTituloClausula : chr ""
$ utilizarSiracc : chr ""
$ pagina : chr "2"
$ qtdTotalRegistro : chr "1740"
Then I did the following to access the results:
library(httr)
a<-GET(url1)
b<-POST(url2,body=body,set_cookies(unlist(a$cookies)))
But unfortunately the answer does not return the expected results.
Answer :
The question is how to perform this specific scraping in R. Notice that the form for TpQ requires a list, which we can implement as a vector.
In R, it would look like this:
body <- list(
nrCnpj="",
nrCei="",
noRazaoSocial="",
dsCategoria="",
tpRequerimento=c("acordo",
"acordoColetivoEspecificoPPE",
"acordoColetivoEspecificoDomingosFeriados",
"convencao",
"termoAditivoAcordo",
"termoAditivoConvecao",
"termoAditivoAcordoEspecificoPPE",
"termoAditivoAcordoEspecificoDomingoFeriado"),
tpVigencia="2",
sgUfDeRegistro="SE",
dtInicioRegistro="",
dtFimRegistro="",
dtInicioVigenciaInstrumentoColetivo="",
dtFimVigenciaInstrumentoColetivo="",
tpAbrangencia="Todos os tipos",
ufsAbrangidasTotalmente="SE",
cdMunicipiosAbrangidos="",
cdGrupo="",
cdSubGrupo="",
noTituloClausula="",
utilizarSiracc="",
pagina="2",
qtdTotalRegistro="1740")
library(httr)
url1<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo"
a <- GET(url1)
url2 <- "http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo/getConsultaAvancada"
b <- POST(url2,body=body,set_cookies(unlist(a$cookies)))