简体   繁体   中英

Extracting the date from an HTML element using R

I'm trying to see if I can extract dates from an online community using R. At the moment, I'm a bit of a newcomer, but not having much luck using the R package. It seems to pull a huge list rather than any specific date or time.

I've tried using the Rvest package to read URL and then select the HTML element I want to extract the date. I just can't find the date anywhere within it.

This is what I've tried so far.

  discussion <- read_html("https://en.community.sonos.com/wireless-speakers-228992/bass-cutting-out-on-play-5-will-come-back-intermittently-when-volume-is-turned-up-5568948")
  local.date <- discussion %>% 
  html_nodes(".qa-latest-post-time") %>% html_text()
  discussion

Is there a better way?

Ideally I'd get a specific date (and time) from this. If not, at least a specific date would be useful.

You're selecting the nodes' text but the date information is stored in an attribute (you can find this out by printing the HTML nodes themselves):

discussion %>% html_nodes('.qa-latest-post-time') %>% html_attr('datetime')

Ideally I'd get a specific date (and time) from this.

The site's source code does not seem to contain post times — at least not in your example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM