简体   繁体   中英

Get URL from onclick attribute with JSOUP in java

I am new with Jsoup and I am trying to get the URL from onclick attribute which calls a function called ga and which has five parameters , so it looks like this ga('send', 'event', 'tunein', 'playjp', 'http://link that i want to get'); , I want to grab the http url.

I tried with attr("onclick") option but it doesn't function at all, do you know if there is a chance to get this somehow.

Are you sure you are on the right node ?

node.attr("onclick") should work

Can you post the link of the page you are trying to scrap, and how you reach the node ?

public void jsoupParse() throws IOException {
        Document doc = Jsoup.connect("https://www.internet-radio.com/station/dougeasyhits/").get();
        Element image = doc.select("div.jp-controls").select("i").get(0); //get the first image (play button)
        String onclick = image.attr("onclick");
        System.out.print(onclick);

    }

output :

ga('send', 'event', 'tunein', 'playjp', 'http://airspectrum.cdnstream1.com:8114/1648_128.m3u');

now all you need to do is manipulate the string with 'split' method to extract the url :

Document doc = Jsoup.connect("https://www.internet-radio.com/station/dougeasyhits/").get();
    Element image = doc.select("div.jp-controls").select("i").get(0); //get the first image (play button)
    String onclick = image.attr("onclick");
    String[] parts = onclick.split("'"); //i split the string in an array of strings using [ ' ] as separator
    String url = parts[9]; //the url is contained in the 10th element of the array
    System.out.println(onclick);
    System.out.print(url);

output

    ga('send', 'event', 'tunein', 'playjp', 'http://airspectrum.cdnstream1.com:8114/1648_128.m3u');
http://airspectrum.cdnstream1.com:8114/1648_128.m3u

this is how the "onclick" attribute got split in case you are confused :

parts[0] : "ga("
parts[1] : "send"
parts[2] : ", "
parts[3] : "event"
parts[4] : ", "
parts[5] : "tunein"
parts[6] : ", "
parts[7] : "playjp"
parts[8] : ", "
parts[9] : "http://airspectrum.cdnstream1.com:8114/1648_128.m3u"
parts[10] : ");"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM