Trying to extract all the NAME on the following page http://www.thinkbabynames.com/popular/1/us
I'm using the rvest package in R.
The following code allows me to get the name that appears in 'Top 10' and 'Trend' section.
url <- http://www.thinkbabynames.com/popular/1/us
get_names <- function(html){
names <- html %>%
read_html() %>%
html_nodes('a b') %>%
html_text()
names <- get_names(url)
For names in 'Top 11-2000' I used the following code, but it returns an empty character.
get_names2 <- function(html){
html.read <- html %>%
read_html() %>%
html_nodes(xpath='//*[@id="load"]/table/tbody/tr/td[2]/a') %>%
html_text()
}
names2 <- get_names2(url)
I'm new to HTML code, any suggestion would be appreciated
I am new to HTML and rvest
too, here is my exploration. Hope that helps and leave the rest to you:
url <- 'http://www.thinkbabynames.com/popular/1/us'
name = read_html(url)
name %>%
html_nodes("table") %>%
html_table(fill= TRUE) %>%
.[[9]] -> top2000
> head(top2000)
X1 X2
1 Rank Name
2 11-20. Alexander, Oliver, Daniel, Lucas, Matthew, Aiden, Jackson, Logan, David, Joseph
3 21-30. Samuel, Henry, Owen, Sebastian, Gabriel, Carter, Jayden, John, Luke, Anthony
4 31-40. Isaac, Dylan, Wyatt, Andrew, Joshua, Christopher, Grayson, Jack, Julian, Ryan
5 41-50. Jaxon, Levi, Nathan, Caleb, Hunter, Christian, Isaiah, Thomas, Aaron, Lincoln
6 51-60. Charles, Eli, Landon, Connor, Josiah, Jonathan, Cameron, Jeremiah, Mateo, Adrian
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.