简体   繁体   中英

Scrape value using rvest and xpath

Trying to extract all the NAME on the following page http://www.thinkbabynames.com/popular/1/us

I'm using the rvest package in R.

The following code allows me to get the name that appears in 'Top 10' and 'Trend' section.

url <- http://www.thinkbabynames.com/popular/1/us

get_names <- function(html){
  names <- html %>% 
    read_html() %>%
    html_nodes('a b') %>%  
    html_text()

names <- get_names(url)

For names in 'Top 11-2000' I used the following code, but it returns an empty character.

get_names2 <- function(html){
  html.read <- html %>% 
    read_html() %>% 
    html_nodes(xpath='//*[@id="load"]/table/tbody/tr/td[2]/a') %>% 
    html_text()
}
names2 <- get_names2(url)

I'm new to HTML code, any suggestion would be appreciated

I am new to HTML and rvest too, here is my exploration. Hope that helps and leave the rest to you:

url <- 'http://www.thinkbabynames.com/popular/1/us'

name = read_html(url)

name %>% 
  html_nodes("table") %>% 
  html_table(fill= TRUE) %>% 
  .[[9]] -> top2000

> head(top2000)
      X1                                                                                        X2
1   Rank                                                                                      Name
2 11-20.  Alexander,  Oliver,  Daniel,  Lucas,  Matthew,  Aiden,  Jackson,  Logan,  David,  Joseph
3 21-30.     Samuel,  Henry,  Owen,  Sebastian,  Gabriel,  Carter,  Jayden,  John,  Luke,  Anthony
4 31-40.    Isaac,  Dylan,  Wyatt,  Andrew,  Joshua,  Christopher,  Grayson,  Jack,  Julian,  Ryan
5 41-50.    Jaxon,  Levi,  Nathan,  Caleb,  Hunter,  Christian,  Isaiah,  Thomas,  Aaron,  Lincoln
6 51-60. Charles,  Eli,  Landon,  Connor,  Josiah,  Jonathan,  Cameron,  Jeremiah,  Mateo,  Adrian

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM