简体   繁体   English

WebHarvest XML格式不正确

[英]WebHarvest XML not well formed

I am using WebHarvest to try to receive data from Woot.com and I'm getting a few different errors. 我正在使用WebHarvest尝试从Woot.com接收数据,但遇到了一些不同的错误。 I am able to get the website with the first process, but when I try to test xpath inside of the variable window I get the error org.xml.sax.SAXParseException; 我可以通过第一个过程获取网站,但是当我尝试在变量窗口中测试xpath时,出现错误org.xml.sax.SAXParseException;。 lineNumber: 86; lineNumber:86; columnNumber: 99; columnNumber:99; The reference to entity "pt2" must end with the ';' 对实体“ pt2”的引用必须以“;”结尾 delimiter . 定界符 If I try to use the pretty print function it returns XML is not well-formed: the reference to entity "pt2" must end with the ';' 如果尝试使用漂亮的打印功能,则返回的XML格式不正确:对实体“ pt2”的引用必须以“;”结尾 delimiter. 定界符。 {line: 86, col:99]. {line:86,col:99]。 Lastly, Inside of the script I am writing, if I put in the xpath tag with an expression, I get element type "xpath" must be followed by either attributespecifications,">" or "/>". 最后,在我正在编写的脚本内部,如果我在xpath标记中添加一个表达式,则得到的元素类型“ xpath”必须后跟attributespecification,“>”或“ />”。 Can someone tell me what I am doing wrong? 有人可以告诉我我在做什么错吗? I am very new to WebHarvest and don't have any experience with this kind of program. 我对WebHarvest还是很陌生,对这种程序没有任何经验。

My code is: 我的代码是:

<?xml version="1.0" encoding="UTF-8"?><config>
<xpath expression="(//div[@class="overview"])[1]//h2/text()">
<html-to-xml>
<http url="http://www.woot.com/"/>
</html-to-xml>
</xpath>
</config>

To make the XML well-formed you have use &apos; 为了使XML格式正确,您可以使用&apos; instead of &quot; 而不是&quot; within the attribute expression . 在属性expression And here it goes: 结果如下:

<?xml version="1.0" encoding="UTF-8"?><config>
<xpath expression="(//div[@class='overview'])[1]//h2/text()">
<html-to-xml>
<http url="http://www.woot.com/"/>
</html-to-xml>
</xpath>
</config>

You could use &apos; 您可以使用&apos; or &quot; &quot; to wrap an attribute. 包装一个属性。 But, it cannot be nested anyway. 但是,无论如何都不能嵌套它。 Here are few examples: 以下是一些示例:

 <xpath expression='(//div[@class="overview"])[1]//h2/text()'>           --- valid
 <xpath expression='(//div[@class='overview'])[1]//h2/text()'>           --- invalid
 <xpath expression="(//div[@class="overview"])[1]//h2/text()">           --- invalid
 <xpath expression='(//div[@class=&apos;overview&apos;])[1]//h2/text()'> --- valid
 <xpath expression="(//div[@class=&apos;overview&apos;])[1]//h2/text()"> --- valid
 <xpath expression="(//div[@class=&quot;overview&quot;])[1]//h2/text()"> --- valid

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM