简体   繁体   English

如何使用Jsoup捕获此文本?

[英]How to catch this text with Jsoup?

I'm going crazy just for trying to extract some text from this source code: 我为从此源代码中提取一些文本而疯狂:

<tr class="even"> <!-- Title --> <td class="title riot" title="Summoners, We will be performing Live Maintenance on the 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. Following up...">

I've tried a lot of combinations of constructors, but I can't really do this without any advice... I need to catch the text between the " after title... 我已经尝试了很多构造函数的组合,但是如果没有任何建议,我真的无法做到这一点。我需要在标题之后的文本之间捕捉文本。

Please, note that there's a similar class, called "odd", that has the same syntax of the first one, and this is it: 请注意,有一个类似的类,称为“ odd”,具有与第一个相同的语法,就是这样:

<tr class="odd">
<!-- Title -->
<td class="title riot" title="Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...">

So, I need something that can catch the text written on the both classes... 因此,我需要一些可以捕获在两个类上编写的文本的东西...

Thanks for the help. 谢谢您的帮助。

EDIT: Here's my code, where I connect and catch some links: 编辑:这是我的代码,我在其中连接并捕获一些链接:

Document doc = Jsoup.connect("http://forums.euw.leagueoflegends.com/board/forumdisplay.php?f=10")
                                    .userAgent("Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22")
                                    .timeout(30000).get();
                    Elements links = doc.select("a[href*=thread]");
                    for (Element link : links){
                        if(link.attr("href").contains("board")||link.attr("href").contains("page")||link.text().matches("1")){}
                        else{
                            titles.add((String) link.text());

                            //descriptions.add((String) DEFAULT_FORUM_URL + link.attr("href"));
                            descriptions.add((String) doc.select("[title*=a]").toString());
                        }
                    }

The commented line writes on each second row of a ListView , the link of the thread, but I need to write there thr brief description that is between those tags "td class="title riot" title=", from each class. 注释行写在ListView的第二行(线程的链接)上,但是我需要在每个类的那些标签“ td class =“ title riot” title =“之间写上简短说明。

Naturally, this line 自然地,这条线

descriptions.add((String) doc.select("[title*=a]").toString());

doesn't work. 不起作用。

How about this: 这个怎么样:

Document doc = Jsoup.connect("http://forums.euw.leagueoflegends.com/board/forumdisplay.php?f=10").get();

for (Element element : doc.select("tr.odd > td, tr.even > td")) {
    System.out.println(element.attr("title"));
}

Which will output: 将输出:

Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...




Summoners, 

We will be performing a maintenance on 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. 

Following up on the...

Here is a sample: 这是一个示例:

public static final String text = "" +
    "<table><tr class=\"even\"> <!-- Title -->\n" +
    "    <td class=\"title riot\"\n" +
    "        title=\"Summoners, We will be performing Live Maintenance on the 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. Following up...\">\n" +
    "    </td>\n" +
    "</tr>\n" +
    "<tr class=\"odd\">\n" +
    "    <!-- Title -->\n" +
    "    <td class=\"title riot\"\n" +
    "        title=\"Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...\">\n" +
    "    </td>\n" +
    "</tr></table>";

public static void main(String[] args) throws IOException {
    Document doc = Jsoup.parse(text);

    //System.out.println("your doc:" + doc);

    for (Element element : doc.select("tr > td")) {
        System.out.println(element.attr("title"));
    }
}

Prints: 打印:

Summoners, We will be performing Live Maintenance on the 26/11 at 04:00 AM, where we will need to bring the EUW Platform offline. Following up...
Summoners, welcome to the Service Status forum! Here you can come to see information regarding ongoing issues or events that we are currently working...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM