[英]Why am I unable to extract this table using rvest?
I am trying to extract the information on sales by region and shareholders from this website . 我正在尝试从该网站中提取按地区和股东销售的信息。
I tried using rvest but the resulting extracted table is empty. 我尝试使用rvest,但所得的提取表为空。 Is there another way to do it besides using RSelenium? 除了使用RSelenium之外,还有另一种方法吗?
library(dplyr)
library(tidyverse)
library(rvest)
url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/company/"
wahis.session <- html_session(url)
r1 <- wahis.session %>%
html_nodes(xpath = '//*[@id="zbCenter"]/div/span/table[4]/tbody/tr[2]/td[1]/table[3]/tbody/tr[2]/td/table') %>%
html_table(fill = TRUE)
r2 <- wahis.session %>%
html_nodes(xpath = '//*[@id="XLT27Z-S-CH"]') %>%
html_table(fill = TRUE)
If you don't want to use xpath
, you can list all tables with html_nodes("table")
and then choose the ones you need. 如果您不想使用xpath
,则可以列出所有带有html_nodes("table")
,然后选择所需的表。 However, it may be a bit hard to locate desired tables if there are a lot of them in the page, which is the case here: 但是,如果页面中有很多想要的表,可能会很难找到,在这种情况下:
library(rvest)
library(dplyr)
url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/"
tables <- read_html(url) %>%
html_nodes("table")
# Ex: 'Quotes 5-day view' table
tables[26] %>%
html_table(fill = T)
When I copy the xpath using Firefox's inspector I also can't extract the "Sales per Region" table. 当我使用Firefox的检查器复制xpath时,我也无法提取“每个地区的销售额”表。 Xpath can be frustrating. Xpath可能令人沮丧。 However, the xpath given by Selector Gadget seems to work. 但是, 选择器小工具提供的xpath似乎可以工作。 Try the following: 请尝试以下操作:
library(rvest)
wahis.session %>%
html_nodes(xpath = '//*[(((count(preceding-sibling::*) + 1) = 4) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "nfvtTab", " " ))]') %>%
html_table(header = T, fill = TRUE)
Which returns: 哪个返回:
2016 2016 2017 2017 Delta
1 CHF (in Million) % 2017 CHF (in Million) %
2 United States 14,972 22.5% 14,397 22.8% -3.84%
3 Other 7,830 11.8% 7,702 12.2% -1.63%
4 Spain 6,076 9.1% 4,215 6.7% -30.63%
5 Germany 4,646 7% 4,350 6.9% -6.38%
6 United Kingdom 4,365 6.6% 4,322 6.9% -0.99%
7 Switzerland 4,200 6.3% 4,223 6.7% +0.55%
8 Brazil 2,104 3.2% 2,617 4.1% +24.36%
9 Italy 1,830 2.8% 2,202 3.5% +20.28%
10 Japan 946.22 1.4% - - -
11 Australia 930.45 1.4% 1,227 1.9% +31.85%
12 Chile - - 1,061 1.7% -
Alternatively, you could just extract all the tables into a list of dataframes using table
+ class attribute. 或者,您可以使用table
+ class属性将所有表提取到数据帧列表中。 The following should successfully parse all but the "Equities" table. 以下应该成功解析除“ Equities”表以外的所有表。 You'll get a subscript error for that one, probably because the table has only one row: 您将得到一个下标错误,可能是因为表只有一行:
library(purrr)
wahis.session %>%
html_nodes("table.nfvtTab") %>%
map(safely(html_table), header = T, fill = T)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.