I'm trying to rbind series of HTML Tables (from different pages with same col names) but some pages have "no records" , I want to skip such pages or assign NULL to the dataframe.
Example Dataframe 1
url="http://stats.espncricinfo.com/ci/engine/player/28081.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=1;template=results;type=batting;view=match"
Batting=readHTMLTable(url)
Batting$"Match by match list"
Batting<-Batting$"Match by match list"
Dataframe 2
url="http://stats.espncricinfo.com/ci/engine/player/625383.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=2;template=results;type=batting;view=match"
Batting=readHTMLTable(url)
Batting$"Match by match list"
Batting<-Batting$"Match by match list"
There are several such Dataframes which have records in tabular form and some that don't have records
When I rbind the one with no records is causing error for final dataframe
final_DF<-rbind(Dataframe1,Dataframe2)
How do I resolve this!?
PS: And for each url query I'm adding certain set of columns(say 5 additional columns using cbind) based on my requirement to the dataframe.
You can do the following:
require(rvest)
require(tidyverse)
urls <- c(
"http://stats.espncricinfo.com/ci/engine/player/28081.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=1;template=results;type=batting;view=match",
"http://stats.espncricinfo.com/ci/engine/player/625383.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=2;template=results;type=batting;view=match"
)
extra_cols <- list(
tibble("Team"="IND","Player"="MS.Dhoni","won"=1,"lost"=0,"D"=1,"D/N"=0,"innings"=1,"Format"="ODI"),
tibble("Team"="IND","Player"="MS.Dhoni","won"=1,"lost"=0,"D"=1,"D/N"=0,"innings"=1,"Format"="ODI")
)
doc <- map(urls, read_html) %>%
map(html_node, ".engineTable:nth-child(5)")
keep <- map_lgl(doc, ~class(.) != "xml_missing")
map(doc[keep], html_table, fill = TRUE) %>%
map2_df(extra_cols[keep], cbind)
The critical part is the discard
which removes all list-elements of class "xml_missing" eg the empty ones.
I comparison to your code i use CSS selector to specify the html_node
that should inherit the table. See http://selectorgadget.com/
Also your rbind
is done internally by map2_df
(the last row)
This results in: (using %>% {head(.[,c("Bat1", "Runs", "Team")])}
)
Bat1 Runs Team
1 0 0 IND
2 3 3 IND
3 148 148 IND
4 56 56 IND
5 38 38 IND
6 20 20 IND
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.