How to scrape id from an element in rvest?

Question

Each div.grpl-grp clearfix (each club element) on this page Has it's own id:

https://uws-community.symplicity.com/index.php?s=student_group

I am trying to scrape each of these ids, however my current method, as shown below does not work. What am I doing wrong?

url <- 'https://uws-community.symplicity.com/index.php?s=student_group'
page <- html_session(url)

id_nodes <- html_nodes(page, "div.grpl-grp clearfix") %>% html_attrs("id")

I need to use HTML session because I'm scraping other data that I need the session for.

Answer 1

There are two changes you need to do in the code.

The class has to be mentioned as "div.grpl-grp.clearfix"

You should use html_attr

 library(rvest) url <- 'https://uws-community.symplicity.com/index.php?s=student_group' page <- html_session(url) html_nodes(page, "div.grpl-grp.clearfix") %>% html_attr("id") #[1] "grpl_5bf9ea61bc46eaeff075cf8043c27c92" #[2] "grpl_17e4ea613be85fe019efcf728fb6361d" #[3] "grpl_d593eb48fe26d58f616515366a1e677b" #[4] "grpl_5b445690da34b7cff962ee2bf254db9e" #[5] "grpl_cd1ebcef22852bdb5301a243803a2909" ....

Or if you want to do everything in one chain

url %>%
   read_html() %>%
   html_nodes("div.grpl-grp.clearfix") %>%
   html_attr("id")

#[1]"grpl_5bf9ea61bc46eaeff075cf8043c27c92" "grpl_17e4ea613be85fe019efcf728fb6361d"
#[3]"grpl_d593eb48fe26d58f616515366a1e677b" "grpl_5b445690da34b7cff962ee2bf254db9e"
#[5]"grpl_cd1ebcef22852bdb5301a243803a2909" "grpl_0a7da33f968a919ecfa06486f0787bc7"

How to scrape id from an element in rvest?

Question

1 answers

solution1
2 ACCPTED 2018-08-27 03:08:38

How to scrape id from an element in rvest?

Question

1 answers

solution1 2 ACCPTED 2018-08-27 03:08:38

solution1
2 ACCPTED 2018-08-27 03:08:38