[英]How to get first a href tag from every div using Jsoup
我正在使用 Jsoup 的元素 class 來獲取特定 div 中 <a href > 標記中的鏈接。 大多數情況下,div 只有一個 <a href> 標簽。 但是,有一種情況是 div 中有兩個 <a href> 標記。 在這種情況下,我只想獲取第一個 <a href> 標簽 URL 而忽略第二個。 有什么辦法可以做到這一點? 請幫忙。
Java代碼
String response = ; // html code
Document document = Jsoup.parse(response);
if (document != null) {
Elements links = document.select("div.kCrYT > a[href]"); // gets all the URL's
for (Element link : links) {
String linkHref = link.attr("href");
System.out.println("linkHref: " + linkHref);
}
HTML代碼
<div>
<div class="ZINbbc xpd O9g5cc uUPGi">
<div class="kCrYT">
<a href="/url?q=https://en.wikipedia.org/wiki/Mobile_phone&sa=U&ved=2ahUKEwjvy9fH9unoAhWLmOAKHZMGCYcQFjAaegQIARAB&usg=AOvVaw3g3Lc1rBf-L5ZlWeE9ggx7">
<div class="BNeawe vvjwJb AP7Wnd">
Mobile phone - Wikipedia
</div>
<div class="BNeawe UPmit AP7Wnd">
https://en.wikipedia.org › wiki › Mobile_phone
</div>
</a>
</div>
<div class="x54gtf"></div>
<div class="kCrYT">
<div>
<div class="BNeawe s3v9rd AP7Wnd">
<div>
<div>
<div class="BNeawe s3v9rd AP7Wnd">
A mobile phone, cellular phone, cell phone, cellphone or hand phone, sometimes shortened to simply mobile, cell or just phone, is a portable telephone that can ...
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div>
<div class="ZINbbc xpd O9g5cc uUPGi">
<div class="kCrYT">
<a href="/url?q=https://www.digitaltrends.com/mobile/&sa=U&ved=2ahUKEwjvy9fH9unoAhWLmOAKHZMGCYcQtwIwG3oECAIQAQ&usg=AOvVaw0SEels_2PKSQyaFaMbZQpT">
<div class="BNeawe vvjwJb AP7Wnd">
Mobile Phone and App News / Reviews | iOS, Android, and More ...
</div>
<div class="BNeawe UPmit AP7Wnd">
https://www.digitaltrends.com › mobile
</div>
</a>
</div>
<div class="x54gtf"></div>
<div class="kCrYT">
<a href="/url?q=https://www.digitaltrends.com/mobile/&sa=U&ved=2ahUKEwjvy9fH9unoAhWLmOAKHZMGCYcQuAIwG3oECAIQAg&usg=AOvVaw1NrWt0iIdzg2X2zgb-h8Vq">
<div class="lcJF1d SXn0g GXKcHe p1CInd">
<img class="EYOsld" style="display:block;max-width:120px;max-height:90px" alt="Video for mobile" src="data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==" id="dimg_1" data-deferred="1" />
<div class="qW7zYd HMoqlc"></div>
<div class="qW7zYd X8r0X" style="background-size:36px"></div>
</div>
</a>
<div>
<div class="BNeawe s3v9rd AP7Wnd">
<div>
<div>
<div class="BNeawe s3v9rd AP7Wnd">
<span class="r0bn4c rQMQod">3 days ago</span>
<span class="r0bn4c rQMQod"> · </span>News, reviews, and discussion regarding Android, iOS, and everything else in the mobile realm ...Posted: 3 days ago
</div>
</div>
</div>
</div>
</div>
<div class="rl7ilb"></div>
</div>
</div>
</div>
假設您的<a>
標簽是相關<div>
標簽的直接子標簽,您可以使用first-child偽選擇器:
document.select("div.kCrYT > a:first-child[href]")
如果不能保證,您可以使用first-of-type偽選擇器來獲得等效效果:
document.select("div.kCrYT > a:first-of-type[href]")
上述解決方案的更多細節,以防它幫助其他人(我知道它不適用於 OP,如他們的評論中所述)。
我使用的 jSoup 版本是 1.13.1:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
編碼:
private void getDivs() {
String response = testHtml; // the sample html code
Document document = Jsoup.parse(response);
if (document != null) {
// this works:
Elements links = document.select("div.kCrYT > a:first-child[href]");
// this also works:
//Elements links = document.select("div.kCrYT > a:first-of-type[href]");
for (Element link : links) {
String linkHref = link.attr("href");
System.out.println("linkHref: " + linkHref);
}
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.