[英]How to parse following String present in HTML and build DOM Tree in Java?
I have below string in html and I want to build Dom tree and get name value pair. 我在html中有以下字符串,我想构建Dom树并获取名称值对。 How i can do this using html parser or xml parser or REGEXP. 我如何使用html解析器或xml解析器或REGEXP做到这一点。 any code snippet will be useful. 任何代码段都将很有用。 Thanks 谢谢
<$$TagStarts>
<==0>Name0</==0><##0>Value0</##0>
<==1>Name1</==1><##1>Value1</##1>
<==2>Name2</==2><##2>Value2</##2>
<==3>Name3</==3><##3>Value3</##3>
<==4>Name4</==4><##4>Value4</##4>
<==5>Name5</==5><##5>Value5</##5>
</$$TagStarts>
Assuming the tag names are just for sample.... and you will have some meaningful tag names... 假设标签名称仅用于示例....,您将获得一些有意义的标签名称...
Try using any of the following HTML parsers... 尝试使用以下任何HTML解析器...
http://home.ccil.org/~cowan/XML/tagsoup/ http://home.ccil.org/~cowan/XML/tagsoup/
http://nekohtml.sourceforge.net/ http://nekohtml.sourceforge.net/
http://jtidy.sourceforge.net/ http://jtidy.sourceforge.net/
They will give you the W3 compliant document object.... After this it is just a game of getElementsByTagName
or getElementById
or Use XPath or Xquery to get the elements from the DOM. 他们将为您提供W3兼容的文档对象。...之后,这只是一个getElementsByTagName
或getElementById
的游戏,或者使用XPath或Xquery从DOM中获取元素。
Otherwise you can use the following... They have their own document object implementation... 否则,您可以使用以下...他们有自己的文档对象实现...
http://htmlcleaner.sourceforge.net/ [It also has some basic XPath support] http://htmlcleaner.sourceforge.net/ [它也有一些基本的XPath支持]
http://jsoup.org/ [It has jquery like query
API] http://jsoup.org/ [它具有类似query
API的jquery]
ADD Check this... http://jsoup.org/cookbook/extracting-data/selector-syntax 添加检查此... http://jsoup.org/cookbook/extracting-data/selector-syntax
I will recommend ... Either JSoup or Nekohtml 我会推荐... JSoup或Nekohtml
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.