从xml节点集中提取文本

Question

I'm pulling a date from a webpage and having a hard time extracting the text 我正在从网页上获取日期，并且很难提取文本

date_ <- html_nodes(page_, xpath = '//*[@id="particular_con"]/div[2]/text()')
## prints ## 
# {xml_nodeset (1)}
# [1]  2017-03-27

I tried adding ``[[(1L) or date_[[1]] but this prints 我尝试添加``[[(1L)或date_[[1]]但这会打印

{xml_node}
<text>

I want to extract just 2017-03-27 我只想提取2017-03-27

Answer 1

Just use html_text 只需使用html_text

As the function name says html_nodes returns the kinda pointers to the nodes. 正如函数名称所说， html_nodes返回指向节点的html_nodes指针。 To extract information from them use html_text and html_attr 要从中提取信息，请使用html_text和html_attr

Change first line to: 将第一行更改为：

date_ <- html_nodes(page_, xpath = '//*[@id="particular_con"]/div[2]/text()') %>% html_text()

从xml节点集中提取文本

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-04-04 12:44:14

从xml节点集中提取文本

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-04-04 12:44:14

解决方案1
4 已采纳 2017-04-04 12:44:14