简体   繁体   English

Jsoup如何从HTML获取值

[英]Jsoup how to get values from html

So I'm trying to get specific information from this link: https://myanimelist.net/anime/31988/Hibike_Euphonium_2 因此,我试图从此链接中获取特定信息: https : //myanimelist.net/anime/31988/Hibike_Euphonium_2

I don't really understand html so this is a bit harder for me. 我不太了解html,所以对我来说有点难。

I'm looking specifically get information from here: 我正在专门从这里获取信息:

<div>
    <span class="dark_text">Studios:</span>
          <a href="/anime/producer/2/Kyoto_Animation" title="Kyoto Animation">Kyoto Animation</a>  </div>

<div class="spaceit">

What I'm trying to do is search for when it says "Studios" and then get the title of the href link (Kyoto Animation). 我要尝试做的是搜索显示“工作室”的内容,然后获取href链接的标题(京都动画)。

So for I have managed to get this: 因此,我已经设法做到了:

Document doc = Jsoup.connect("https://myanimelist.net/anime/31988/Hibike_Euphonium_2").get();

        Elements studio = doc.select("a[href][title]");
        for(Element link : studio){
            System.out.println(link.attr("title"));
        }

And it's outputting this: 它的输出是这样的:

Lantis
Pony Canyon
Rakuonsha
Ponycan USA
Kyoto Animation
Drama
Music
School
Kyoto Animation
Go to the Last Post
Go to the Last Post
Anime You Should Watch Before Their Sequels Air This Fall 2016 Season
Collection
Follow @myanimelist on Twitter

It should be 它应该是

doc.select("span:contains(Studios) + a[href][title]");

of I assume that span is common element for list header. 我假设span是列表标题的通用元素。

So basicly this selector gets all span elements that contains text Studios and then gets 1 level children a elements having attributes href and title 因此,基本上,此选择器获取包含text Studios所有span元素,然后获取1 a具有属性hreftitle属性的1级子元素

Just in case, given selector will select only one link and in span More universal could be 以防万一,给定的选择器将只选择一个链接,并且span可能更大

*:contains(Studio) > a[title]

and that means - take every a element that has title attribute and is direct children of any (*) element that contains test Studio . 这意味着-走好每a具有元素title属性,是包含测试任何(*)元素的直接子 Studio Contains takes into account all text from descending children as well. 包含也考虑了降序子级的所有文本。 For text of specific element :textOwn is used. 对于特定元素的文本,使用:textOwn

Not tested, but what about something like 未经测试,但是类似的东西

    ...
    Elements studio = doc.select("a[@title='Kyoto Animations']");
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM