简体   繁体   中英

Rvest XML web scraping

I'm a beginner and I have a problem with scraping.

I need to get data about the active/inactive VEIS number for a few clients. For now, I trying for only one. On the website, I have to: set values and sending the form, after that the browser redirects to the next page, where I can find an interesting date.

Below I sent my code. Maybe someone can help.

library(rvest)
library(XML)

url <- 'http://ec.europa.eu/taxation_customs/vies/vatResponse.html? 
locale=pl'
session1 <- html_session(url)
form1 <-html_form(session1)
form1

date <- set_values(form1[[1]], requesterMemberStateCode = "AT- 
Austria",requesterNumber = "4324")
date

set <- submit_form(session = session1,form = date)

First of all you don't need the XML package, rvest is enough.

You had the form submitting part almost right, you just put in wrong field names.

library(rvest)
#> Loading required package: xml2

url <- 'http://ec.europa.eu/taxation_customs/vies/vatResponse.html?locale=pl'
session1 <- html_session(url)
form1 <-html_form(session1)
form1[[1]]
#> <form> 'vowRequest' (POST vatResponse.html)
#>   <select> 'memberStateCode' [0/29]
#>   <input text> '': --
#>   <input text> 'number': 
#>   <input text> 'traderName': 
#>   <select> 'traderCompanyType' [0/0]
#>   <input text> 'traderStreet': 
#>   <input text> 'traderPostalCode': 
#>   <input text> 'traderCity': 
#>   <select> 'requesterMemberStateCode' [0/30]
#>   <input text> '': 
#>   <input text> 'requesterNumber': 
#>   <input hidden> 'action': check
#>   <input submit> 'check': Weryfikuj

date <- set_values(form1[[1]], memberStateCode = "AT", number = "4324")

set <- submit_form(session = session1,form = date)
#> Submitting with 'NULL'

After that, extracting the values you are interested in it's easy:

set %>% 
  read_html() %>% 
  html_table(fill = TRUE) %>% 
  purrr::pluck(1) %>% 
  dplyr::slice(4:n()) %>% 
  dplyr::select(1:2)
#> # A tibble: 6 x 2
#>   X1                      X2                 
#>   <chr>                   <chr>              
#> 1 Państwo Członkowskie    AT                 
#> 2 Numer VAT               AT 4324            
#> 3 Data zapytania          2018/05/17 14:33:10
#> 4 Nazwa                   ---                
#> 5 Adres                   ---                
#> 6 Identyfikator zapytania ""

Created on 2018-05-17 by the reprex package (v0.2.0).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM