简体   繁体   中英

Jsoup how to get values from html

So I'm trying to get specific information from this link: https://myanimelist.net/anime/31988/Hibike_Euphonium_2

I don't really understand html so this is a bit harder for me.

I'm looking specifically get information from here:

<div>
    <span class="dark_text">Studios:</span>
          <a href="/anime/producer/2/Kyoto_Animation" title="Kyoto Animation">Kyoto Animation</a>  </div>

<div class="spaceit">

What I'm trying to do is search for when it says "Studios" and then get the title of the href link (Kyoto Animation).

So for I have managed to get this:

Document doc = Jsoup.connect("https://myanimelist.net/anime/31988/Hibike_Euphonium_2").get();

        Elements studio = doc.select("a[href][title]");
        for(Element link : studio){
            System.out.println(link.attr("title"));
        }

And it's outputting this:

Lantis
Pony Canyon
Rakuonsha
Ponycan USA
Kyoto Animation
Drama
Music
School
Kyoto Animation
Go to the Last Post
Go to the Last Post
Anime You Should Watch Before Their Sequels Air This Fall 2016 Season
Collection
Follow @myanimelist on Twitter

It should be

doc.select("span:contains(Studios) + a[href][title]");

of I assume that span is common element for list header.

So basicly this selector gets all span elements that contains text Studios and then gets 1 level children a elements having attributes href and title

Just in case, given selector will select only one link and in span More universal could be

*:contains(Studio) > a[title]

and that means - take every a element that has title attribute and is direct children of any (*) element that contains test Studio . Contains takes into account all text from descending children as well. For text of specific element :textOwn is used.

Not tested, but what about something like

    ...
    Elements studio = doc.select("a[@title='Kyoto Animations']");
    ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM