“ XML内容似乎不是XML”：R中xmlTreeParse中的错误

Question

I am going through the topicmodels tutorial in R. Around page 12, they strip HTML tags and Greek letters: 我正在阅读R中的topicmodels教程。在第12页左右，它们去除了 HTML标签和希腊字母：

R> library("XML")
R> remove_HTML_markup <- function(s) {
+ doc <- htmlTreeParse(s, asText = TRUE, trim = FALSE)
+ xmlValue(xmlRoot(doc))
+ }
R> remove_HTML_markup(JSS_papers[1,"description"])
Error: XML content does not seem to be XML, nor to identify a file name ...

JSS_papers stores metadata related to a collection of papers downloaded from a journal. JSS_papers存储与从期刊下载的论文集相关的元数据。 The entry under the description tag is the abstract of the article. description标记下的条目是文章的摘要。 This one doesn't have any tags: 这个没有任何标签：

JSS_papers[1,"description"] = "The fit of a variogram model to spatially-distributed 
    data is often difficult to assess. A graphical diagnostic written in S-plus is   
    introduced that allows the user to determine both the general quality of the fit of a 
    variogram model, and to find specific pairs of locations that do not have measurements 
    that are consonant with the fitted variogram. It can help identify nonstationarity,    
    outliers, and poor variogram fit in general. Simulated data sets and a set of soil      
    nitrogen concentration data are examined using this graphical diagnostic."

Answer 1

I had this same problem recently. 我最近有同样的问题。 The variable that I had assigned with the URL had a typo in it. 我为URL分配的变量中有一个错字。 Double-check your variable, s, and see if there's something wrong there. 仔细检查变量s，看看那里是否有问题。

“ XML内容似乎不是XML”：R中xmlTreeParse中的错误

问题描述

1 个解决方案

解决方案1
0 2017-07-02 10:44:05

“ XML内容似乎不是XML”：R中xmlTreeParse中的错误

问题描述

1 个解决方案

解决方案1 0 2017-07-02 10:44:05

解决方案1
0 2017-07-02 10:44:05