从xml解析文本

Question

I have following link 我有以下链接

https://hero.epa.gov/hero/ws/swift.cfc?method=getProjectRIS&project_id=993&getallabstracts=true

I want to parse this xml to get only text, like 我想解析此xml以仅获取文本，例如

Provider: HERO - 2.xx
DBvendor=EPA
Text-encoding=UTF-8

How can I parse it ? 我该如何解析？

Answer 1

Well, it's not a text file, it's an HTML file. 嗯，这不是文本文件，而是HTML文件。 If you open a file in browser and select view source you will be able to see text enclosed in <char> tags. 如果您在浏览器中打开文件并选择view source您将能够看到<char>标记中包含的文本。

When it's opened in browser, these tags and other HTML content is interpreted and output is rendered on the page (that's why it looks like a text). 在浏览器中打开它时，这些标签和其他HTML内容将被解释并在页面上呈现输出（这就是它看起来像文本的原因）。 If you want to implement similar behavior in Java then you should look into PhantomJS and/or JSoup examples. 如果要在Java中实现类似的行为，则应查看PhantomJS和/或JSoup示例。

Answer 2

It looks like a text file but it is an XML file and the browser just displays its text content. 它看起来像一个文本文件，但它是一个XML文件，浏览器仅显示其文本内容。 To verify right click and look at the page source. 要验证右键单击并查看页面源。

Answer 3

You can use a library like Jsoup for parsing the file and getting the contents. 您可以使用Jsoup之类的库来解析文件并获取内容。

https://jsoup.org/cookbook/introduction/parsing-a-document https://jsoup.org/cookbook/introduction/parsing-a-document

从xml解析文本

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-05-23 18:35:50

解决方案2
0 2017-05-23 18:34:41

解决方案3
0 2017-05-24 17:21:13

从xml解析文本

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-05-23 18:35:50

解决方案2 0 2017-05-23 18:34:41

解决方案3 0 2017-05-24 17:21:13

解决方案1
2 已采纳 2017-05-23 18:35:50

解决方案2
0 2017-05-23 18:34:41

解决方案3
0 2017-05-24 17:21:13