[英]rvest html scraping text from span
I'm trying to get just the coordinates from this page, http://hol.osu.edu/spmInfo.html?id=CMNHENT0042647 . 我正试图从此页面http://hol.osu.edu/spmInfo.html?id=CMNHENT0042647获得坐标。 When I try to get the text all I get is
" "
in return. 当我尝试获取文本时,我得到的只是
" "
作为回报。
library(rvest)
ID<-"CMNHENT0042647"
HOLWebSite<-read_html("http://hol.osu.edu/spmInfo.html?id=",ID)
Coords<-HOLWebSite%>%
html_nodes("span#hymSpmCoordsID.boldedText")%>%
html_text()
Is it because it is in a span? 是因为它在跨度中吗?
What's actually in the span in the scraped page is <span class="boldedText" id="hymSpmCoordsID">\\n <!-- To Be DB Generated //-->\\n</span>
. <span class="boldedText" id="hymSpmCoordsID">\\n <!-- To Be DB Generated //-->\\n</span>
页面中的跨度实际上是<span class="boldedText" id="hymSpmCoordsID">\\n <!-- To Be DB Generated //-->\\n</span>
。 There are no co-ordinates in the HTML. HTML中没有坐标。
You can verify this by going to the page and viewing the source. 您可以通过转到页面并查看源来验证这一点。
You can grab it this way: 您可以通过以下方式获取它:
library(httr)
library(jsonlite)
get_specimen_info <- function(specimen) {
GET(
url = "http://hol.osu.edu/hymDB/OJ_Break.getSpmInfo",
query = list(
cuid = specimen,
callback = "",
noCacheIE = round(as.numeric(Sys.time()) * 1000)
),
add_headers(Referer = sprintf("http://hol.osu.edu/spmInfo.html?id=%s", specimen)),
set_cookies(hymShowInfo = "Y")
) -> res
stop_for_status(res)
res <- trimws(content(res, as="text"))
res <- gsub("^\\(|);$", "", res)
res <- jsonlite::fromJSON(res)
res
}
The page retrieves the data dynamically and that function (which takes a species code as a parameter) mimics the call. 该页面动态地检索数据,并且该函数(将种类代码作为参数)模仿了该调用。
Now to use it: 现在使用它:
spec <- get_specimen_info("CMNHENT0042647")
str(spec)
## List of 1
## $ spmInfo:List of 47
## ..$ cuid : chr "CMNHENT0042647"
## ..$ alt_ids : list()
## ..$ loc_id : int 9661
## ..$ loc_name : chr "Defiance Township, Defiance Co., OH"
## ..$ lat : num 41.3
## ..$ lng : num -84.4
## ..$ elev : chr ""
## ..$ max_elev : chr ""
## ..$ prec_type : chr "POINT"
## ..$ loc_comments : chr ""
## ..$ coord_source : chr "USGS-GNIS"
## ..$ hier :List of 7
## .. ..$ place:List of 3
## .. .. ..$ id : chr "202"
## .. .. ..$ name: chr "Defiance"
## .. .. ..$ type: chr "County"
## .. ..$ pol2 :List of 3
## .. .. ..$ id : chr "202"
## .. .. ..$ name: chr "Defiance"
## .. .. ..$ type: chr "County"
## .. ..$ pol1 :List of 3
## .. .. ..$ id : chr "82"
## .. .. ..$ name: chr "Ohio"
## .. .. ..$ type: chr "State"
## .. ..$ pol0 :List of 3
## .. .. ..$ id : chr "81"
## .. .. ..$ name: chr "United States"
## .. .. ..$ type: chr "Country"
## .. ..$ pol-1:List of 3
## .. .. ..$ id : chr "23"
## .. .. ..$ name: chr "North America"
## .. .. ..$ type: chr "Continent"
## .. ..$ pol-3:List of 3
## .. .. ..$ id : chr "5621"
## .. .. ..$ name: chr "Western Hemisphere"
## .. .. ..$ type: chr "Hemisphere"
## .. ..$ pol-4:List of 3
## .. .. ..$ id : chr "0"
## .. .. ..$ name: chr "Earth"
## .. .. ..$ type: chr ""
## ..$ coll_event_id : chr "343832"
## ..$ coll_method : chr "none specified"
## ..$ coll_date : chr "18 August 1981"
## ..$ coll_date_alt : chr "18.VIII.1981"
## ..$ coll_time :List of 2
## .. ..$ start: chr ""
## .. ..$ end : chr ""
## ..$ date_type : chr "CLOCK_TIME"
## ..$ field_code : chr ""
## ..$ collector : chr "Perry, T. E."
## ..$ collector_alt : chr "T. E. Perry"
## ..$ collector_extended:'data.frame': 1 obs. of 6 variables:
## .. ..$ last_name : chr "Perry"
## .. ..$ first_name : chr ""
## .. ..$ initials : chr "T. E."
## .. ..$ generation : chr ""
## .. ..$ name_order : chr "W"
## .. ..$ collector_id: int 33377
## ..$ determinations :'data.frame': 1 obs. of 17 variables:
## .. ..$ det_id : int 2217760
## .. ..$ tnuid : int 355808
## .. ..$ id : int 355808
## .. ..$ taxon : chr "Macromia taeniolata"
## .. ..$ author : chr "Rambur"
## .. ..$ det_date : chr "2016"
## .. ..$ status : chr "Original name/combination"
## .. ..$ det_status : chr "CURRENT"
## .. ..$ type_status : chr ""
## .. ..$ determiner_id: int 0
## .. ..$ cu_coll_id : chr ""
## .. ..$ coll_id : chr ""
## .. ..$ rank : chr "Species"
## .. ..$ valid : chr "Valid"
## .. ..$ homonym : chr "N"
## .. ..$ common_names :List of 1
## .. .. ..$ : chr [1:2] "Wabash River Cruiser" "Royal River Cruiser"
## .. ..$ determiner : chr ""
## ..$ Class : chr "Hexapoda"
## ..$ Genus : chr "Macromia"
## ..$ Species : chr "Macromia taeniolata"
## ..$ Family : chr "Corduliidae"
## ..$ Phylum : chr "Arthropoda"
## ..$ Kingdom : chr "Animalia"
## ..$ Order : chr "Odonata"
## ..$ habitat : chr ""
## ..$ associations : list()
## ..$ spm_sex : chr "M"
## ..$ spm_num : chr "1"
## ..$ life_status : chr "adult"
## ..$ inst_id : chr "195"
## ..$ inst_name : chr "Cleveland Museum of Natural History, OH"
## ..$ inst_code : chr "CLEV"
## ..$ vouchered : logi TRUE
## ..$ comments : chr "[OH, Defiance Co., Defiance, 18-AUG-1981, T. E. Perry, coll.] [ADP - CMNH 13395]"
## ..$ enterer : chr "Roberta DeSalvo"
## ..$ updater : chr "hmajewski"
## ..$ date_recorded : chr "2-AUG-2016"
## ..$ preparations :'data.frame': 1 obs. of 3 variables:
## .. ..$ prep_type : chr "pin"
## .. ..$ prep_contents: chr ""
## .. ..$ num_preps : int 1
## ..$ images : list()
## ..$ sequences : list()
## ..$ last_update : chr "2016-08-15T12:32:24Z"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.