如何解析HTML中存在的以下String并在Java中构建DOM树？

Question

I have below string in html and I want to build Dom tree and get name value pair. 我在html中有以下字符串，我想构建Dom树并获取名称值对。 How i can do this using html parser or xml parser or REGEXP. 我如何使用html解析器或xml解析器或REGEXP做到这一点。 any code snippet will be useful. 任何代码段都将很有用。 Thanks 谢谢



<$$TagStarts>

<==0>Name0</==0><##0>Value0</##0>
<==1>Name1</==1><##1>Value1</##1>
<==2>Name2</==2><##2>Value2</##2>
<==3>Name3</==3><##3>Value3</##3>
<==4>Name4</==4><##4>Value4</##4>
<==5>Name5</==5><##5>Value5</##5>

</$$TagStarts>

Answer 1

Assuming the tag names are just for sample.... and you will have some meaningful tag names... 假设标签名称仅用于示例....，您将获得一些有意义的标签名称...

Try using any of the following HTML parsers... 尝试使用以下任何HTML解析器...

http://home.ccil.org/~cowan/XML/tagsoup/ http://home.ccil.org/~cowan/XML/tagsoup/

http://nekohtml.sourceforge.net/ http://nekohtml.sourceforge.net/

http://jtidy.sourceforge.net/ http://jtidy.sourceforge.net/

They will give you the W3 compliant document object.... After this it is just a game of getElementsByTagName or getElementById or Use XPath or Xquery to get the elements from the DOM. 他们将为您提供W3兼容的文档对象。...之后，这只是一个getElementsByTagName或getElementById的游戏，或者使用XPath或Xquery从DOM中获取元素。

Otherwise you can use the following... They have their own document object implementation... 否则，您可以使用以下...他们有自己的文档对象实现...

http://htmlcleaner.sourceforge.net/ [It also has some basic XPath support] http://htmlcleaner.sourceforge.net/ [它也有一些基本的XPath支持]

http://jsoup.org/ [It has jquery like query API] http://jsoup.org/ [它具有类似query API的jquery]

ADD Check this... http://jsoup.org/cookbook/extracting-data/selector-syntax 添加检查此... http://jsoup.org/cookbook/extracting-data/selector-syntax

I will recommend ... Either JSoup or Nekohtml 我会推荐... JSoup或Nekohtml

如何解析HTML中存在的以下String并在Java中构建DOM树？

问题描述

1 个解决方案

解决方案1
3 2010-12-16 10:52:15

如何解析HTML中存在的以下String并在Java中构建DOM树？

问题描述

1 个解决方案

解决方案1 3 2010-12-16 10:52:15

解决方案1
3 2010-12-16 10:52:15