简体   繁体   English

从xml节点集中提取文本

[英]Extract text from xml nodeset

I'm pulling a date from a webpage and having a hard time extracting the text 我正在从网页上获取日期,并且很难提取文本

date_ <- html_nodes(page_, xpath = '//*[@id="particular_con"]/div[2]/text()')
## prints ## 
# {xml_nodeset (1)}
# [1]  2017-03-27 

I tried adding ``[[(1L) or date_[[1]] but this prints 我尝试添加``[[(1L)date_[[1]]但这会打印

{xml_node}
<text>

I want to extract just 2017-03-27 我只想提取2017-03-27

Just use html_text 只需使用html_text

As the function name says html_nodes returns the kinda pointers to the nodes. 正如函数名称所说, html_nodes返回指向节点的html_nodes指针。 To extract information from them use html_text and html_attr 要从中提取信息,请使用html_texthtml_attr

Change first line to: 将第一行更改为:

date_ <- html_nodes(page_, xpath = '//*[@id="particular_con"]/div[2]/text()') %>% html_text()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM