简体   繁体   English

从xml解析文本

[英]parse text from xml

I have following link 我有以下链接

https://hero.epa.gov/hero/ws/swift.cfc?method=getProjectRIS&project_id=993&getallabstracts=true

I want to parse this xml to get only text, like 我想解析此xml以仅获取文本,例如

Provider: HERO - 2.xx
DBvendor=EPA
Text-encoding=UTF-8

How can I parse it ? 我该如何解析?

Well, it's not a text file, it's an HTML file. 嗯,这不是文本文件,而是HTML文件。 If you open a file in browser and select view source you will be able to see text enclosed in <char> tags. 如果您在浏览器中打开文件并选择view source您将能够看到<char>标记中包含的文本。

When it's opened in browser, these tags and other HTML content is interpreted and output is rendered on the page (that's why it looks like a text). 在浏览器中打开它时,这些标签和其他HTML内容将被解释并在页面上呈现输出(这就是它看起来像文本的原因)。 If you want to implement similar behavior in Java then you should look into PhantomJS and/or JSoup examples. 如果要在Java中实现类似的行为,则应查看PhantomJS和/或JSoup示例。

It looks like a text file but it is an XML file and the browser just displays its text content. 它看起来像一个文本文件,但它是一个XML文件,浏览器仅显示其文本内容。 To verify right click and look at the page source. 要验证右键单击并查看页面源。

You can use a library like Jsoup for parsing the file and getting the contents. 您可以使用Jsoup之类的库来解析文件并获取内容。

https://jsoup.org/cookbook/introduction/parsing-a-document https://jsoup.org/cookbook/introduction/parsing-a-document

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM