简体   繁体   English

为什么我无法使用rvest提取此表?

[英]Why am I unable to extract this table using rvest?

I am trying to extract the information on sales by region and shareholders from this website . 我正在尝试从该网站中提取按地区和股东销售的信息。

I tried using rvest but the resulting extracted table is empty. 我尝试使用rvest,但所得的提取表为空。 Is there another way to do it besides using RSelenium? 除了使用RSelenium之外,还有另一种方法吗?

library(dplyr)
library(tidyverse)
library(rvest)

url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/company/"
wahis.session <- html_session(url)                                
r1 <-    wahis.session %>%
  html_nodes(xpath = '//*[@id="zbCenter"]/div/span/table[4]/tbody/tr[2]/td[1]/table[3]/tbody/tr[2]/td/table') %>%
  html_table(fill = TRUE) 

r2 <-    wahis.session %>%
  html_nodes(xpath = '//*[@id="XLT27Z-S-CH"]') %>%
  html_table(fill = TRUE) 

If you don't want to use xpath , you can list all tables with html_nodes("table") and then choose the ones you need. 如果您不想使用xpath ,则可以列出所有带有html_nodes("table") ,然后选择所需的表。 However, it may be a bit hard to locate desired tables if there are a lot of them in the page, which is the case here: 但是,如果页面中有很多想要的表,可能会很难找到,在这种情况下:

library(rvest)
library(dplyr)

url <- "https://www.marketscreener.com/ZURICH-INSURANCE-GROUP-2955923/"

tables <- read_html(url) %>%
  html_nodes("table") 

# Ex: 'Quotes 5-day view' table
tables[26] %>%
  html_table(fill = T)

When I copy the xpath using Firefox's inspector I also can't extract the "Sales per Region" table. 当我使用Firefox的检查器复制xpath时,我也无法提取“每个地区的销售额”表。 Xpath can be frustrating. Xpath可能令人沮丧。 However, the xpath given by Selector Gadget seems to work. 但是, 选择器小工具提供的xpath似乎可以工作。 Try the following: 请尝试以下操作:

library(rvest)

wahis.session %>%
    html_nodes(xpath = '//*[(((count(preceding-sibling::*) + 1) = 4) and parent::*)]//*[contains(concat( " ", @class, " " ), concat( " ", "nfvtTab", " " ))]') %>%
    html_table(header = T, fill = TRUE)

Which returns: 哪个返回:

                              2016  2016   2017             2017   Delta
1                 CHF (in Million)     %   2017 CHF (in Million)       %
2   United States           14,972 22.5% 14,397            22.8%  -3.84%
3           Other            7,830 11.8%  7,702            12.2%  -1.63%
4           Spain            6,076  9.1%  4,215             6.7% -30.63%
5         Germany            4,646    7%  4,350             6.9%  -6.38%
6  United Kingdom            4,365  6.6%  4,322             6.9%  -0.99%
7     Switzerland            4,200  6.3%  4,223             6.7%  +0.55%
8          Brazil            2,104  3.2%  2,617             4.1% +24.36%
9           Italy            1,830  2.8%  2,202             3.5% +20.28%
10          Japan           946.22  1.4%      -                -       -
11      Australia           930.45  1.4%  1,227             1.9% +31.85%
12          Chile                -     -  1,061             1.7%       -

Alternatively, you could just extract all the tables into a list of dataframes using table + class attribute. 或者,您可以使用table + class属性将所有表提取到数据帧列表中。 The following should successfully parse all but the "Equities" table. 以下应该成功解析除“ Equities”表以外的所有表。 You'll get a subscript error for that one, probably because the table has only one row: 您将得到一个下标错误,可能是因为表只有一行:

library(purrr)

wahis.session %>% 
    html_nodes("table.nfvtTab") %>% 
    map(safely(html_table), header = T, fill = T)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM