![](/img/trans.png)
[英]Select and iterate through elements and sub elements with same name (Jsoup)
[英]How to select sub elements with same tag in the same element jsoup?
我需要使用元素標簽div
, h3
, a
等通過jsoup來解析頁面。我想通過div.g
元素來解析並獲取以下類的文本: a class="l _PMs"
和a class="_pJs"
顯示在jList
。
以Google新聞為例,該頁面如下所示:
<div class="g">
<div class="ts _JGs _KHs _oGs _KGs _jHs">
<a class="top _xGs _SHs" href="url" onmousedown="return rwt(this,'','','','1','dfda','','sdfa','','',event)">
<img class="th _RGs" src="url" alt="Story image" onload="typeof google==='object'&&google.aft&&google.aft(this)">
</a>
<div class="_hJs">
<h3 class="r _gJs">
<a class="l _PMs" href="url" onmousedown="return rwt(this,'','','','1','dfs','','sdfa','','',event)">Report on <em>Example</em> Testing<em>Club</em> ...</a>
</h3>
<div class="slp">
<span class="_OHs _PHs">link</span>
<span class="_QGs">-</span>
<span class="f nsa _QHs">date</span>
</div>
<div class="st">description</div>
</div>
<div class="_sJs card-section">
<a class="_pJs" href="url" onmousedown="return rwt(this,'','','','1','sdf','','sdfa','','',event)" data-href="url">Final review of <em>example's</em> of <em>testing</em>...
</a>
</div>
<div class="_cJs"></div>
<div class="_sJs card-section">
<a class="_pJs" href="url" onmousedown="return rwt(this,'','','','1','dfa','','dfs-d','','',event)" data-href="url">Report on this testing
</a>
</div>
<div class="_cJs"></div>
<div class="_eJs card-section">
<a class="_pJs" href="url" onmousedown="return rwt(this,'','','','1','ad','','dfsaf','','',event)">Test report example
</a>
</div>
<div class="_cJs"></div>
</div>
</div>
<div class="g">
<div class="ts _JGs _KHs _oGs _KGs _jHs">
<a class="top _xGs _SHs" href="url" onmousedown="return rwt(this,'','','','1','dfda','','sdfa','','',event)">
<img class="th _RGs" src="url" alt="Story image" onload="typeof google==='object'&&google.aft&&google.aft(this)">
</a>
<div class="_hJs">
<h3 class="r _gJs">
<a class="l _PMs" href="url" onmousedown="return rwt(this,'','','','1','dfs','','sdfa','','',event)">Cloud<em>Example</em> Testing<em>1</em> ...</a>
</h3>
<div class="slp">
<span class="_OHs _PHs">link</span>
<span class="_QGs">-</span>
<span class="f nsa _QHs">date</span>
</div>
<div class="st">description</div>
</div>
<div class="_sJs card-section">
<a class="_pJs" href="url" onmousedown="return rwt(this,'','','','1','sdf','','sdfa','','',event)" data-href="url">Final review of this<em>testing</em>...
</a>
</div>
<div class="_cJs"></div>
<div class="_sJs card-section">
<a class="_pJs" href="url" onmousedown="return rwt(this,'','','','1','dfa','','dfs-d','','',event)" data-href="url">Report on this...
</a>
</div>
<div class="_cJs"></div>
<div class="_eJs card-section">
<a class="_pJs" href="url" onmousedown="return rwt(this,'','','','1','ad','','dfsaf','','',event)">Example 2...
</a>
</div>
<div class="_cJs"></div>
<div class="tsw _QMs">
<div class="_jJs card-section">
<a class="_MHs" href="url" target="_blank" onmousedown="return rwt(this,'','','','2','sdfs','','dfd','','',event)" data-href="url">
<img class="_iJs" id="news-media-image-52779751835836-0" src="url" alt="image1" onload="typeof google==='object'&&google.aft&&google.aft(this)">
<div class="_RMs">USA TODAY.</div>
</a>
<a class="_MHs" href="url" target="_blank" onmousedown="return rwt(this,'','','','2','sdfsa','','dsfa','','',event)">
<img class="_iJs" id="news-media-image-52779751835836-1" src="url" alt="image2" onload="typeof google==='object'&&google.aft&&google.aft(this)">
<div class="_RMs">image2./div>
</a>
</div>
<div class="_NMs">
<a class="_OMs" href="url">View all
</a>
</div>
</div>
</div>
</div>
這是代碼:
String input = txtSearch.getText();
input = input.replace(" ", "+");
String url = "http://www.google.com/search?q=" + input + "&tbm=nws&source=lnms";
try {
Document doc = Jsoup.connect(url).userAgent("Chrome").timeout(5000).get();
Elements e = doc.select("div.g");
DefaultListModel<String> listModel = new DefaultListModel<>();
e.forEach((e1) -> {
e1.getElementsByTag("a").forEach(linkElement -> listModel.addElement(linkElement.text()));
});
newsList.setModel(listModel);
} catch (IOException ex) {
Logger.getLogger(MainUI.class.getName()).log(Level.SEVERE, null, ex);
}
jList
中顯示的實際輸出為:
Report on Example Testing Club...
Final review of example's of testing...
Report on this testing.
Test report example.
Cloud Example Testing 1.
Final review of this testing.
Report on this...
Example 2...
USA TODAY.
image2.
View all
我如何選擇這些類:沒有a class=_MHs
和a class=_OMs
a class="l _PMs"
和a class="_pJs"
,如下所示(在jList
):
Report on Example Testing Club...
Final review of example's of testing...
Report on this testing.
Test report example.
Cloud Example Testing 1.
Final review of this testing.
Report on this...
Example 2...
只需更改此行:
Elements e = doc.select("div.g");
至
Elements e = doc.select("div.g").select("div.a");
循環中僅檢查文本,例如:
for(Element element:e)
{
yourList.add(e.text());
}
元素e = doc.select(“ div.g”)。select(“ a”); 我們將列出div.g標簽的每個標簽元素。 因此,現在我們可以通過for循環遍歷每個標簽,並查找文本甚至屬性。
問題是,你選擇所有a
給定的內部元素div
和調用.text()
方法的所有元素的這份名單上-它自然返回你所有的連鎖文字a
元素。
為了使代碼按預期工作,您可以更改:
e.forEach((e1) -> {
listModel.addElement(e1.getElementsByTag("a").text());
});
至:
e.forEach((e1) -> {
e1.getElementsByTag("a").forEach(linkElement -> listModel.addElement(linkElement.text()));
});
更新
如果只想選擇l
+ _PMs
或_pJs
類a
元素,則可以這樣重寫代碼:
Document doc = Jsoup.connect(url).userAgent("Chrome").timeout(5000).get();
DefaultListModel<String> listModel = new DefaultListModel<>();
doc.select("div.g a.l._PMs, div.g a._pJs")
.forEach(element -> listModel.addElement(element.text()));
newsList.setModel(listModel);
選擇器為: div.g al_PMs, div.g a._pJs
,這意味着選擇滿足以下條件之一的所有元素:
a
具有元件l
和_PMs
類是內部div
與元件g
類 _pJs
類a
元素內, a
元素位於具有g
類的div
元素內
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.