如何使用Jsoup解析HTMl文件

Question

I have a html table & want to extract link text based on certain condition 我有一个html表，并想根据某些条件提取链接文本

<table border="0" cellpadding="3" cellspacing="0" width="100%">
<tbody>
<tr class="dir"><td colspan="2">&nbsp;&nbsp;<a href="http://xyz/">Yogendra sharma</a></td></tr>
<tr>
<td class="f"><a href="abc">abc</a>&nbsp;</td>
<td>
<tt class="con">
<a class="s" href="mno"><span class="l">7</span> mno <b>Hello</b>;</a>
<br>
</tt>
</td></tr>

<tr class="dir"><td colspan="2">&nbsp;&nbsp;<a href="http://xyz/">Yogendra</a></td></tr>
<tr>
<td class="f"><a href="abc">abc</a>&nbsp;</td>
<td>
<tt class="con">
<a class="s" href="mno"><span class="l">7</span> mno <b>Hello</b>;</a>
<br>
</tt>
</td></tr>
</table>

i want to print all first link text ie Yogendra Sharma & Yogendra for html file. 我想为html文件打印所有第一链接文本，即Yogendra Sharma和Yogendra。

this file is huge. 这个文件很大。

i use java with jsoup but cant figger it out. 我将Java与jsoup结合使用，但无法解决。 please help me . 请帮我。

Answer 1

You can try the below code. 您可以尝试以下代码。 You would need commons-io-1.3.2.jar , jsoup.jar . 您将需要commons-io-1.3.2.jar和jsoup.jar 。 Save the html as sample.html in the root folder of project. 将html作为sample.html保存在项目的根文件夹中。

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;

import org.apache.commons.io.IOUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class ExtractFromHTML {

    public static void main(String[] args) throws IOException {

        File input = new File("sample.html");

        InputStream in = new FileInputStream(input);

        String htmlOut = IOUtils.toString(in);

        Document document = Jsoup.parse(htmlOut);

        Elements elementsA = document.select("a");

        Iterator<Element> elementIterator = elementsA.iterator();

        while (elementIterator.hasNext()) {
            Element aElement = elementIterator.next();

            if (aElement.outerHtml().contains("http://xyz/")) {
                System.out.println(aElement.text());
            }

        }
    }
}

Output : 输出：

Yogendra sharma
Yogendra

如何使用Jsoup解析HTMl文件

问题描述

1 个解决方案

解决方案1
2 2015-07-10 08:52:35

如何使用Jsoup解析HTMl文件

问题描述

1 个解决方案

解决方案1 2 2015-07-10 08:52:35

解决方案1
2 2015-07-10 08:52:35