简体   繁体   中英

Unable to get links from html - jsoup

With the following code I am able to get the desired text from the website but am unable to get the associated link of the text. Tried several method permutations and combinations. At most what I get is the entire outer html as given below:

<li class="list-item">
<h4><a class="bold" href="abacavir.htm">Abacavir </a>   </h4>

Abacavir is an antiviral drug that is effective against the HIV-1 virus.</li>

Here is the code:

   public static void main(String[] args) throws Exception {
        Map<String,String> drugLinks = new LinkedHashMap<String,String>();
        final int OK = 200;
        //String currentURL;
        //int page = 1;
        int status = OK;
        Connection.Response response = null;
        Document doc = null;
        String[] keywords = {"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"};
        //String keyword = "a";
        for (String keyword : keywords){
            final String url = "https://www.medindia.net/doctors/drug_information/home.asp?alpha=" + keyword;
                response = Jsoup.connect(url)
                        .userAgent("Mozilla/5.0")
                        .execute();
                status = response.statusCode();

                    doc = response.parse();


                            Element tds = doc.select("div.related-links.top-gray.col-list.clear-fix").first();

                            Elements links = tds.select("li[class=list-item]");

                                for (Element link : links){
                                    System.out.println("generic::"+link.select("a[href]").text());
                                    System.out.println("link::"+link.attr("abs:a"));
                }

            }
        }

Output

generic::Abacavir
link::
generic::Abacavir Sulfate and Lamivudine
link::
generic::Abacavir Sulfate, Lamivudine and Zidovudine
link::
generic::Abaloparatide
link::
generic::Abarelix
link::

How do i get the absolute links from the given HTML?

To get the link from the element, you can use:

link.select("a").attr("href")

However, that will only give you the relative link. The full link will be:

"https://www.medindia.net/doctors/drug_information/" + link.select("a").attr("href")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM