[英]Extract text from xml nodeset
I'm pulling a date from a webpage and having a hard time extracting the text 我正在从网页上获取日期,并且很难提取文本
date_ <- html_nodes(page_, xpath = '//*[@id="particular_con"]/div[2]/text()')
## prints ##
# {xml_nodeset (1)}
# [1] 2017-03-27
I tried adding ``[[(1L)
or date_[[1]]
but this prints 我尝试添加``[[(1L)
或date_[[1]]
但这会打印
{xml_node}
<text>
I want to extract just 2017-03-27
我只想提取2017-03-27
Just use html_text
只需使用html_text
As the function name says html_nodes
returns the kinda pointers to the nodes. 正如函数名称所说, html_nodes
返回指向节点的html_nodes
指针。 To extract information from them use html_text
and html_attr
要从中提取信息,请使用html_text
和html_attr
Change first line to: 将第一行更改为:
date_ <- html_nodes(page_, xpath = '//*[@id="particular_con"]/div[2]/text()') %>% html_text()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.