[英]Jsoup how to get values from html
So I'm trying to get specific information from this link: https://myanimelist.net/anime/31988/Hibike_Euphonium_2 因此,我试图从此链接中获取特定信息: https : //myanimelist.net/anime/31988/Hibike_Euphonium_2
I don't really understand html so this is a bit harder for me. 我不太了解html,所以对我来说有点难。
I'm looking specifically get information from here: 我正在专门从这里获取信息:
<div>
<span class="dark_text">Studios:</span>
<a href="/anime/producer/2/Kyoto_Animation" title="Kyoto Animation">Kyoto Animation</a> </div>
<div class="spaceit">
What I'm trying to do is search for when it says "Studios" and then get the title of the href link (Kyoto Animation). 我要尝试做的是搜索显示“工作室”的内容,然后获取href链接的标题(京都动画)。
So for I have managed to get this: 因此,我已经设法做到了:
Document doc = Jsoup.connect("https://myanimelist.net/anime/31988/Hibike_Euphonium_2").get();
Elements studio = doc.select("a[href][title]");
for(Element link : studio){
System.out.println(link.attr("title"));
}
And it's outputting this: 它的输出是这样的:
Lantis
Pony Canyon
Rakuonsha
Ponycan USA
Kyoto Animation
Drama
Music
School
Kyoto Animation
Go to the Last Post
Go to the Last Post
Anime You Should Watch Before Their Sequels Air This Fall 2016 Season
Collection
Follow @myanimelist on Twitter
It should be 它应该是
doc.select("span:contains(Studios) + a[href][title]");
of I assume that span
is common element for list header. 我假设
span
是列表标题的通用元素。
So basicly this selector gets all span
elements that contains text Studios
and then gets 1 level children a
elements having attributes href
and title
因此,基本上,此选择器获取包含text
Studios
所有span
元素,然后获取1 a
具有属性href
和title
属性的1级子元素
Just in case, given selector will select only one link and in span
More universal could be 以防万一,给定的选择器将只选择一个链接,并且
span
可能更大
*:contains(Studio) > a[title]
and that means - take every a
element that has title
attribute and is direct children of any (*) element that contains test Studio
. 这意味着-走好每
a
具有元素title
属性,是包含测试任何(*)元素的直接子 Studio
。 Contains takes into account all text from descending children as well. 包含也考虑了降序子级的所有文本。 For text of specific element
:textOwn
is used. 对于特定元素的文本,使用
:textOwn
。
Not tested, but what about something like 未经测试,但是类似的东西
...
Elements studio = doc.select("a[@title='Kyoto Animations']");
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.