![](/img/trans.png)
[英]Why do I get strings instead of integers when I scrape an HTML table in R?
[英]Why do I get error in matrix when I try to web scrape a table?
这是我的代码示例。 问题在于第二个链接(对于 Cedar Realty Trust)。
library(rvest)
library(stringr)
library(plyr)
library(dplyr)
library(lubridate)
library(readr)
library(stringi)
library(tidyverse)
library(purrr)
urls <- list(c("CEDAR FAIR L P ", "https://www.sec.gov/Archives/edgar/data/811532/000081153219000037/exhibit212018subsidiaries.htm"),
c("CEDAR REALTY TRUST, INC. ", "https://www.sec.gov/Archives/edgar/data/761648/000156459020004590/cdr-ex211_8.htm"),
c("Celanese Corp ", "https://www.sec.gov/Archives/edgar/data/1306830/000130683020000018/ex211-10k123119.htm"))
List.Of.Tabs <- map(urls, ~ {
name <- .x[1]
link <- .x[2]
Sys.sleep(2)
webpage <- read_html(link)
tbls <- html_nodes(webpage, "table")
tbls_ls <- html_table(tbls, fill = TRUE)
pos1 <- possibly(function(tbls) bind_rows(tbls) %>%
filter_all(any_vars(. %in% c("Singapore", "SGP"))) %>%
mutate(name = name)
, otherwise = NA)
pos1(tbls_ls)
})
我得到的错误信息:
Error in matrix(NA_character_, nrow = n, ncol = maxp) :
invalid 'ncol' value (too large or NA)
In addition: Warning messages:
1: In max(p) : no non-missing arguments to max; returning -Inf
2: In matrix(NA_character_, nrow = n, ncol = maxp) :
NAs introduced by coercion to integer range
如何修改我的代码以解决此错误?
这是使用tryCatch
做到这一点的方法。
library(tidyverse)
library(rvest)
map(urls, ~ {
name <- .x[1]
link <- .x[2]
Sys.sleep(2)
tryCatch({
temp <- link %>%
read_html() %>%
html_nodes("table") %>%
html_table(fill = TRUE)
map_df(temp, ~filter_all(.x, any_vars(. %in% c("Singapore", "SGP")))) %>%
mutate(name = name)
}, error = function(e) NA
)
})
#[[1]]
#[1] X1 X2 name
#<0 rows> (or 0-length row.names)
#[[2]]
#[1] NA
#[[3]]
# X1 X2 X3 X4 name
#1 Celanese PTE. LTD. NA Singapore NA Celanese Corp
#2 Celanese Singapore Acetyls Holding PTE. LTD. NA Singapore NA Celanese Corp
#3 Celanese Singapore Chemical Holding PTE. LTD. NA Singapore NA Celanese Corp
#4 Celanese Singapore PTE. LTD. NA Singapore NA Celanese Corp
#5 Celanese Singapore VAM PTE. LTD. NA Singapore NA Celanese Corp
#6 Celanese Singapore Emulsions PTE. LTD. NA Singapore NA Celanese Corp
虽然这给出了警告,但它运行时没有错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.