简体   繁体   中英

java jsoup - How to get all links from a href searching by a text

I have a lot of this lines in a webpage:

<a href="City1/Waves321.aspx"><span><span style="font-family: Courier New">Title</span></span></a> 
<span style="font-family: Courier New"> (<a href="City1/River267.aspx">txt</a>)</span></li></ul>
<a href="City2/Waves761.aspx"><span><span style="font-family: Courier New">Title</span></span></a>
<span style="font-family: Courier New"> (<a href="City2/River767.aspx">txt</a>)</span></li></ul>

and i want to get only:

City1/Waves321.aspx

City2/Waves761.aspx

and so on... every ahref before "Title".

I tested with this code:

public class ListLinks {
    public static void main(String[] args) throws IOException {
        Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        String url = args[0];
        String address;

        Document doc = Jsoup.connect(url).timeout(10*1000).get();
        Elements links = doc.select("a[href~=(Waves)]");
        //String linkText = links.text();

        for (Element link : links) {
            String linkHref = link.attr("href");
            address = url + linkHref;
            System.out.println(address);
        }

and it works for most of the links, but it misses the ones that "Title" is in a new line, like this:

<a href="City/Waves321.aspx"><span><span style="font-family: Courier New">
Title</span></span></a><span style="font-family: Courier New"> (<a href="City/River267.aspx">txt</a>)</span></li></ul>

I cannot change the webpage code (by the way:/)

How can i achieve this in Jsoup?

you can do like this -

Elements e = doc.getElementsByTag("a");
e.stream().forEach(p -> System.out.println(p.attr("href")));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM