简体   繁体   English

我对 rvest package 有什么误解?

[英]How am I wrong with rvest package?

I have trouble with scraping using rvest package in r.我在 r 中使用 rvest package 刮擦时遇到了麻烦。 What I attempted was to collect information from the website and create a data frame which contains vectors specified inside the loop.我试图从网站收集信息并创建一个数据框,其中包含在循环内指定的向量。

Unless I use for loop function, I get correct data.除非我使用 for 循环 function,否则我会得到正确的数据。 Can anybody kindly tell me what's wrong with the following code?谁能告诉我以下代码有什么问题?

My hunch is that I failed in combining vectors...我的预感是我在组合向量时失败了......

library(rvest)
library(dplyr)
library(tidyr)
library(stringr)
library(stringi)


#This is the URL from which I would like to get information.
source_url <- "https://go2senkyo.com/local/senkyo/"
senkyo <- data.frame() 

*start for loop
for (i in 50:60) { 

target_page <- paste0(source_url, i)
recall_html <- read_html(source_url, encoding = "UTF-8")


prefecture <- recall_html %>%
        html_nodes(xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "column_ttl_small", " " ))]') %>%
        html_text()

city <- recall_html %>%
    html_nodes(xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "column_ttl", " " ))]') %>%
    html_text()
city <- trimws(gsub("[\r\n]", "", city )) %>% unlist() %>% str


candidate <- recall_html %>%
    html_nodes(xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "m_senkyo_result_table", " " ))]') %>%
    html_text()
candidate  <- trimws(gsub("[\r\n\t]", "", candidate ))

all <- recall_html %>%
    html_nodes(xpath='//td') %>%
    html_text() 
all <- trimws(gsub("[\r\n\t]", "", all))

election_day <- all[1]
turnout  <- all[2]
magnitude_candidates <- all[3] 
notificationday <- all[4]
turnout_lasttime <- all[5]
others <- all[6]



senkyo2 <- cbind(prefecture, city, candidate, election_day, turnout, magnitude_candidates, notificationday,
        turnout_lasttime, others) 
senkyo  <- rbind(senkyo , senkyo2) 

}

Here seems to be your error:这似乎是你的错误:

recall_html <- read_html(source_url, encoding = "UTF-8")

It should use target_page instead of source_url它应该使用target_page而不是source_url

recall_html <- read_html(target_page, encoding = "UTF-8")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM