简体繁体中英

parse meta tags in Java

原文 2008-11-18 16:49:49 2 3 java/ html/ xml/ parsing

I have a collection of HTML documents for which I need to parse the contents of the <meta> tags in the <head> section. These are the only HTML tags whose values I'm interested in, ie I don't need to parse anything in the <body> section.

I've attempted to parse these values using the XPath support provided by JDom. However, this isn't working out too well because a lot of the HTML in the <body> section is not valid XML.

Does anyone have any suggestions for how I might go about parsing these tag values in manner that can deal with malformed HTML?

Cheers, Don

3 answers

You can likely use the Jericho HTML Parser . In particular, have a look at this to see how you can go about finding specific tags.

如果它适合您的应用程序，您可以使用Tidy将HTML转换为有效的XML，然后使用尽可能多的XPath！

JTidy应该为此提供一个良好的起点。

How to parse tags in java

Java parse nested “tags”

Asserting Meta Tags using Selenium/webdriver (java)

How to remove XML meta tags in java or jsp

Parse anchor tags in java string

Java regex hot to match meta tags content attribute value

Parse text file tags as xml - Java

Parse xml using java and keep html tags

Including meta-java layer to Yocto causes parse failure

Parse HTML data in Java including &lt and &gt tags?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to parse tags in java Java parse nested “tags” Asserting Meta Tags using Selenium/webdriver (java) How to remove XML meta tags in java or jsp Parse anchor tags in java string Java regex hot to match meta tags content attribute value Parse text file tags as xml - Java Parse xml using java and keep html tags Including meta-java layer to Yocto causes parse failure Parse HTML data in Java including &lt and &gt tags?

Related Tags

parse meta tags in Java

Question

3 answers

solution1
6 ACCPTED 2008-11-18 16:56:05

solution2
2 2008-11-18 16:52:53

solution3
0 2008-11-18 16:54:51

parse meta tags in Java

Question

3 answers

solution1 6 ACCPTED 2008-11-18 16:56:05

solution2 2 2008-11-18 16:52:53

solution3 0 2008-11-18 16:54:51

solution1
6 ACCPTED 2008-11-18 16:56:05

solution2
2 2008-11-18 16:52:53

solution3
0 2008-11-18 16:54:51