简体   繁体   中英

Extracting text/number from tspan class tag HTML with R

I am trying to extract Current Production number from this website http://okg.se/sv/Produktionsinformation/ (in the blue area below).

Here is the HTML code part I need to use:

<tspan dy="0" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);">518</tspan>

The example of the code I used:

url <- "http://okg.se/sv/Produktionsinformation//"
download.file(url, destfile = "scrapedpage.html", quiet=TRUE)
content <- read_html("scrapedpage.html")
content %>% html_nodes(".content__info__item__value")

But the result I get shows that there is not nodes available:

{xml_nodeset (0)}

Do you have any ides on how to solve this issue?

Thanks in advance!

I am not pretty sure about the value that you need, but this work

librar(rvest)

# page url
url <- "http://okg.se/sv/Produktionsinformation/"

# current value
read_html(url) %>%
  html_nodes(".footer__gauge") %>%
  html_attr("data-current")

# Max value
read_html(url) %>%
  html_nodes(".footer__gauge") %>%
  html_attr("data-max")

The html you see with your browser has been processed by javascript, so isn't the same as the html you see with rvest.

The raw data you are looking for is actually stored in attributes of a div with the id "gauge", so you get it like this:

library(rvest)
#> Loading required package: xml2

"http://okg.se/sv/Produktionsinformation//" %>%
read_html()                                 %>%
html_node("#gauge")                         %>% 
html_attrs()                                %>%
`[`(c("data-current", "data-max"))
#> data-current     data-max 
#>        "553"       "1450"

Note that you don't need to save the html to your local drive to process it. You can read it directly from the internet by giving the url to read_html

Created on 2020-02-20 by the reprex package (v0.3.0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM