[英]get next element that have same name class jsoup in android studio
我想獲取下一個在html中具有相同名稱類的元素。 html標簽就像:
的HTML:
<section class="post">
<img class="pecintakomik" src="/images/top/op.jpg" alt="pecintakomik.com" />
<div class="post-cnt">
<h2>Manga bla bla</h2>
<ul>
<li><strong>Nama Alternatif:</strong> </li>
<li><strong>Tahun Rilis:</strong> 2010</li>
<li><strong>Author(s):</strong> sensei1
<li><strong>Artist(s):</strong> sense2</li>
<li><strong>Genre:</strong> Action</li>
<li><strong>Sinopsis:</strong> bla bla bla </li>
<li><span class='st_facebook_hcount' displayText='Facebook'></span> <span class='st_twitter_hcount' displayText='Tweet'></span> <span class='st_sharethis_hcount' displayText='ShareThis'></span></li>
</ul>
</div>
<div class="clear"> </div>
</section>
<img src="http://www.pecintakomik.com/images/block.png">
<section class="post">
<div class="post-cnt">
<h2>List Chapter(s)</h2>
<ul>
<li><a href="/manga/bla_bla/816"> bla bl 816 <img src="/images/new.gif"><em>Baca Online </em></a></li>
<li><a href="/manga/bla_bla/815"> bla bla 815<em>Baca Online </em></a></li>
<li><a href="/manga/bla_bla/814"> bla bla 814<em>Baca Online </em></a></li>
<li><a href="/manga/bla_bla/813"> bla bla 813<em>Baca Online </em></a></li>
</ul>
</div>
</section>
我的代碼是獲取列表漫畫的href鏈接(並將其存儲在sqllite上),但我無法獲得它:
Java代碼:
private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int endIndex = unparsedHtml.indexOf("</div>", beginIndex);
String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);
Document parsedDocument = Jsoup.parse(trimmedHtml);
List<Chapter> chapterList = scrapeChaptersFromParsedDocument(parsedDocument);
chapterList = setSourceForChapterList(chapterList);
chapterList = setParentUrlForChapterList(chapterList, request.getUrl());
chapterList = setNumberForChapterList(chapterList);
saveChaptersToDatabase(chapterList, request.getUrl());
return chapterList;
}
private List<Chapter> scrapeChaptersFromParsedDocument(Document parsedDocument) {
List<Chapter> chapterList = new ArrayList<Chapter>();
Element chapterElementnya = parsedDocument.select("div.post-cnt").get(1);
Elements chapterElements = chapterElementnya.getElementsByTag("li");
for (Element chapterElement : chapterElements) {
Chapter currentChapter = constructChapterFromHtmlBlock(chapterElement);
chapterList.add(currentChapter);
}
return chapterList;
}
private Chapter constructChapterFromHtmlBlock(Element chapterElement) {
Chapter newChapter = DefaultFactory.Chapter.constructDefault();
Element urlElement = chapterElement.select("a").first();
Element nameElement = chapterElement.select("a").first();
if (urlElement != null) {
String fieldUrl = "http://www.pecintakomik.com" + urlElement.attr("href");
newChapter.setUrl(fieldUrl);
}
if (nameElement != null) {
String fieldName = nameElement.text();
newChapter.setName(fieldName);
}
boolean fieldNew = chapterElement.html().contains("<img src=\"/images/new.gif\">");
newChapter.setNew(fieldNew);
return newChapter;
}
請任何人知道我如何獲得具有相同名稱的二等艙列表?
這段代碼:
private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int endIndex = unparsedHtml.indexOf("</div>", beginIndex);
String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);
...
}
僅保留第一個列表。 trimmedHtml
將包含以下內容:
<div class="post-cnt">
<h2>Manga bla bla</h2>
<ul>
<li><strong>Nama Alternatif:</strong> </li>
<li><strong>Tahun Rilis:</strong> 2010</li>
<li><strong>Author(s):</strong> sensei1
<li><strong>Artist(s):</strong> sense2</li>
<li><strong>Genre:</strong> Action</li>
<li><strong>Sinopsis:</strong> bla bla bla </li>
<li><span class='st_facebook_hcount' displayText='Facebook'></span> <span class='st_twitter_hcount' displayText='Tweet'></span> <span class='st_sharethis_hcount' displayText='ShareThis'></span></li>
</ul>
</div>
要保留兩個列表,可以執行以下操作:
int beginIndex = unparsedHtml.indexOf("<div class=\"post-cnt\">");
int secondListStart = unparsedHtml.indexOf("<div class=\"post-cnt\">",beginIndex + "<div class=\"post-cnt\">".length());
int endIndex = unparsedHtml.indexOf("</div>", secondListStart) + "</div>".length();
String trimmedHtml = unparsedHtml.substring(beginIndex, endIndex);
但是解析整個頁面會更加安全。 為此,請更改:
Document parsedDocument = Jsoup.parse(trimmedHtml);
至:
Document parsedDocument = Jsoup.parse(unparsedHtml);
嘗試這個
private List<Chapter> parseHtmlToChapters(RequestWrapper request, String unparsedHtml) {
Document parsedDocument = Jsoup.parse(unparsedHtml);
List<Chapter> chapterList = new ArrayList<>();
for (Element a : parsedDocument.select("div.post-cnt a")) {
Chapter newChapter = DefaultFactory.Chapter.constructDefault();
newChapter.setUrl("http://www.pecintakomik.com" + a.attr("href"));
newChapter.setName(a.text());
newChapter.setNew(!a.select("img[src=/images/new.gif]").isEmpty());
chapterList.add(newChapter);
}
// .....
parsedDocument.select("div.post-cnt a")
選擇所有<div class="post-cnt">
元素下的所有<a>
元素。 示例HTML中有四個這樣的元素。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.