[英]How to Parse HTMl file using Jsoup
I have a html table & want to extract link text based on certain condition 我有一个html表,并想根据某些条件提取链接文本
<table border="0" cellpadding="3" cellspacing="0" width="100%">
<tbody>
<tr class="dir"><td colspan="2"> <a href="http://xyz/">Yogendra sharma</a></td></tr>
<tr>
<td class="f"><a href="abc">abc</a> </td>
<td>
<tt class="con">
<a class="s" href="mno"><span class="l">7</span> mno <b>Hello</b>;</a>
<br>
</tt>
</td></tr>
<tr class="dir"><td colspan="2"> <a href="http://xyz/">Yogendra</a></td></tr>
<tr>
<td class="f"><a href="abc">abc</a> </td>
<td>
<tt class="con">
<a class="s" href="mno"><span class="l">7</span> mno <b>Hello</b>;</a>
<br>
</tt>
</td></tr>
</table>
i want to print all first link text ie Yogendra Sharma & Yogendra for html file. 我想为html文件打印所有第一链接文本,即Yogendra Sharma和Yogendra。
this file is huge. 这个文件很大。
i use java with jsoup but cant figger it out. 我将Java与jsoup结合使用,但无法解决。 please help me .
请帮我 。
You can try the below code. 您可以尝试以下代码。 You would need
commons-io-1.3.2.jar
, jsoup.jar
. 您将需要
commons-io-1.3.2.jar
和jsoup.jar
。 Save the html as sample.html
in the root folder of project. 将html作为
sample.html
保存在项目的根文件夹中。
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import org.apache.commons.io.IOUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ExtractFromHTML {
public static void main(String[] args) throws IOException {
File input = new File("sample.html");
InputStream in = new FileInputStream(input);
String htmlOut = IOUtils.toString(in);
Document document = Jsoup.parse(htmlOut);
Elements elementsA = document.select("a");
Iterator<Element> elementIterator = elementsA.iterator();
while (elementIterator.hasNext()) {
Element aElement = elementIterator.next();
if (aElement.outerHtml().contains("http://xyz/")) {
System.out.println(aElement.text());
}
}
}
}
Output : 输出:
Yogendra sharma
Yogendra
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.