简体   繁体   中英

How do I use rvest with an element id name that contains a forward slash?

I'm trying to use rvest to screen scrape an an element id that contains a forward slash. It seems that everything I try as an escape character fails. Suppose that the element I'm trying to select is

<div id ="hello/world"> Some stuff </div>

Using rvest functions, after reading the webpage into a variable called "html", I'm running things like this:

x <- html %>% 
  html_elements("#hello//world")

I've done it using no escape character, different escape characters, etc. But everything I try generates the error:

Error in tokenize(css) : Unexpected character '/' found at position 8.

Any ideas? Big thanks for any help.

It appears that you are searching for an id attribute not an element. Perhaps you can try instead:

x <- html %>% 
  html_elements(xpath = "//div[@id='hello/world']")

I think you should be using html_nodes ?

library(rvest)

html <- read_html('<div id="hello/world"> Some stuff </div>')
html %>% 
  html_nodes("div[id='hello/world']")

Result:

{xml_nodeset (1)}
[1] <div id="hello/world"> Some stuff </div>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM