为什么在readLines（）之后grep（）不起作用？

Question

我用R开发了一个程序，以在线阅读报告，前两行是：

page1 <- readLines("http://reportviewer.tce.mg.gov.br/default.aspx?server=noruega&relatorio=SICOM_Consulta/2013_2014/Modulo_AM/UC03-LeisOrc-RL&municipioSelecionado=3100203&exercicioSelecionado=2014")
line1 <- grep("Leis Autorizativas",page1)

该程序的其余部分工作正常，我得到了所需的数据。 然后我尝试对其进行调整以读取其他报告，但是这次第二行不起作用：

page2 <- readLines("http://reportviewer.tce.mg.gov.br/default.aspx?server=noruega&relatorio=SICOM_Consulta/2013_2014/Modulo_AM/UC08-ConsultarDecretos-RL&municipioSelecionado=3101607&exercicioSelecionado=2013")
line2 <- grep("Decretos de Alterações",page2)

在第一种情况下，“ page1”是字符向量，在第二种情况下，“ page2”是大字符向量。 这种差异是否可能导致了问题？ 如果是这样，是否有人提示如何解决？

（使用htmltab（）或readHTMLtable（）效果不佳）

谢谢。

Answer 1

这是因为“ Decretos deAlterações”并非完全由ascii字符组成。

如果您尝试

page2 <- readLines("http://reportviewer.tce.mg.gov.br/default.aspx?server=noruega&relatorio=SICOM_Consulta/2013_2014/Modulo_AM/UC08-ConsultarDecretos-RL&municipioSelecionado=3101607&exercicioSelecionado=2013")

grep("Decretos de Altera&#231;&#245;es ", page2)

[1] 366

有用。

要知道要更换的号码：

utf8ToInt("ç")
[1] 231

然后将结果数放在&和;之间; ，并替换您的非ascii字母。

最好

科林

为什么在readLines（）之后grep（）不起作用？

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-10-08 20:36:05

为什么在readLines（）之后grep（）不起作用？

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-10-08 20:36:05

解决方案1
2 已采纳 2017-10-08 20:36:05