簡體   English   中英

在Android Studio中獲取具有相同名稱類jsoup的下一個元素

[英]get next element that have same name class jsoup in android studio

我想獲取下一個在html中具有相同名稱類的元素。 html標簽就像:

的HTML:

  <section class="post">
        <img class="pecintakomik" src="/images/top/op.jpg" alt="pecintakomik.com" />
            <div class="post-cnt">
                <h2>Manga bla bla</h2>
                    <ul>
                    <li><strong>Nama Alternatif:</strong> </li>
                    <li><strong>Tahun Rilis:</strong> 2010</li>
                    <li><strong>Author(s):</strong> sensei1
                    <li><strong>Artist(s):</strong> sense2</li>
                    <li><strong>Genre:</strong> Action</li>
                    <li><strong>Sinopsis:</strong> bla bla bla </li>
                    <li><span class='st_facebook_hcount' displayText='Facebook'></span> <span class='st_twitter_hcount' displayText='Tweet'></span> <span class='st_sharethis_hcount' displayText='ShareThis'></span></li>                      
                    </ul>
            </div>
                <div class="clear">&nbsp;</div>
    </section>
    <img src="http://www.pecintakomik.com/images/block.png">
    <section class="post">
        <div class="post-cnt">
            <h2>List Chapter(s)</h2>
            <ul>
                <li><a href="/manga/bla_bla/816"> bla bl 816 <img src="/images/new.gif"><em>Baca Online </em></a></li>
                <li><a href="/manga/bla_bla/815"> bla bla 815<em>Baca Online </em></a></li>
                <li><a href="/manga/bla_bla/814"> bla bla 814<em>Baca Online </em></a></li>
                <li><a href="/manga/bla_bla/813"> bla bla 813<em>Baca Online </em></a></li>
            </ul>
       </div>
    </section>

我的代碼是獲取列表漫畫的href鏈接(並將其存儲在sqllite上),但我無法獲得它:

Java代碼:

private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
    int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
    int endIndex = unparsedHtml.indexOf("</div>", beginIndex);

    String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);

    Document parsedDocument = Jsoup.parse(trimmedHtml);


    List<Chapter> chapterList = scrapeChaptersFromParsedDocument(parsedDocument);
    chapterList = setSourceForChapterList(chapterList);
    chapterList = setParentUrlForChapterList(chapterList, request.getUrl());
    chapterList = setNumberForChapterList(chapterList);

    saveChaptersToDatabase(chapterList, request.getUrl());

    return chapterList;
}

private List<Chapter> scrapeChaptersFromParsedDocument(Document parsedDocument) {
    List<Chapter> chapterList = new ArrayList<Chapter>();

    Element chapterElementnya = parsedDocument.select("div.post-cnt").get(1);
    Elements chapterElements = chapterElementnya.getElementsByTag("li");


    for (Element chapterElement : chapterElements) {
        Chapter currentChapter = constructChapterFromHtmlBlock(chapterElement);

        chapterList.add(currentChapter);
    }

    return chapterList;
}

private Chapter constructChapterFromHtmlBlock(Element chapterElement) {
    Chapter newChapter = DefaultFactory.Chapter.constructDefault();

    Element urlElement = chapterElement.select("a").first();
    Element nameElement = chapterElement.select("a").first();

    if (urlElement != null) {
        String fieldUrl = "http://www.pecintakomik.com" + urlElement.attr("href");
        newChapter.setUrl(fieldUrl);
    }
    if (nameElement != null) {
        String fieldName = nameElement.text();
        newChapter.setName(fieldName);
    }

    boolean fieldNew = chapterElement.html().contains("<img src=\"/images/new.gif\">");
    newChapter.setNew(fieldNew);

    return newChapter;
}

請任何人知道我如何獲得具有相同名稱的二等艙列表?

這段代碼:

private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
    int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
    int endIndex = unparsedHtml.indexOf("</div>", beginIndex);

    String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);
    ...
}

僅保留第一個列表。 trimmedHtml將包含以下內容:

<div class="post-cnt">
    <h2>Manga bla bla</h2>
    <ul>
        <li><strong>Nama Alternatif:</strong> </li>
        <li><strong>Tahun Rilis:</strong> 2010</li>
        <li><strong>Author(s):</strong> sensei1
        <li><strong>Artist(s):</strong> sense2</li>
        <li><strong>Genre:</strong> Action</li>
        <li><strong>Sinopsis:</strong> bla bla bla </li>
        <li><span class='st_facebook_hcount' displayText='Facebook'></span> <span class='st_twitter_hcount' displayText='Tweet'></span> <span class='st_sharethis_hcount' displayText='ShareThis'></span></li>                      
    </ul>
</div>

要保留兩個列表,可以執行以下操作:

int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int secondListStart = unparsedHtml.indexOf("<div class=\"post-cnt\">",beginIndex + "<div class=\"post-cnt\">".length());
int endIndex = unparsedHtml.indexOf("</div>", secondListStart) + "</div>".length();

String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);

但是解析整個頁面會更加安全。 為此,請更改:

Document parsedDocument = Jsoup.parse(trimmedHtml);

至:

Document parsedDocument = Jsoup.parse(unparsedHtml);

嘗試這個

private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {

    Document parsedDocument = Jsoup.parse(unparsedHtml);

    List<Chapter> chapterList = new ArrayList<>();

    for (Element a : parsedDocument.select("div.post-cnt a")) {
        Chapter newChapter = DefaultFactory.Chapter.constructDefault();
        newChapter.setUrl("http://www.pecintakomik.com" + a.attr("href"));
        newChapter.setName(a.text());
        newChapter.setNew(!a.select("img[src=/images/new.gif]").isEmpty());
        chapterList.add(newChapter);
    }
    // .....

parsedDocument.select("div.post-cnt a")選擇所有<div class="post-cnt">元素下的所有<a>元素。 示例HTML中有四個這樣的元素。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM