简体   繁体   English

使用Jsoup从表和网站的所有选项卡中获取链接

[英]Getting links from the table and all the tabs of a website using Jsoup

I'm new to web scraping so the question may not have been framed perfectly. 我是网络爬虫的新手,所以这个问题可能还没有得到很好的解决。 I am trying to extract all the drug name links from a given page alphbetically and as a result extract all az drug links, then iterate over these links to extract information from within each of these like generic name, brand etc. I have a very basic code below that doesn't work. 我试图按字母顺序从给定页面中提取所有药品名称链接,并因此提取所有az药品链接,然后遍历这些链接以从诸如通用名称,品牌等每个此类信息中提取信息。我有一个非常基本的知识下面的代码不起作用。 Some help in approaching this problem will be much appreciated. 在解决此问题方面的一些帮助将不胜感激。

public class WebScraper {
  public static void main(String[] args) throws Exception {

    String keyword = "a"; //will iterate through all the alphabets eventually
    String url = "http://www.medindia.net/drug-price/brand-index.asp?alpha=" + keyword; 

    Document doc = Jsoup.connect(url).get();
    Element table = doc.select("table").first();
    Elements links = table.select("a[href]"); // a with href
    for (Element link : links) {
    System.out.println(link.attr("href"));
  }
}

After looking at the website and what you are expecting to get, it looks like you are grabbing the wrong table element. 在查看了网站以及您期望获得的结果之后,您似乎正在获取错误的表格元素。 You don't want the first table, you want the second. 您不想要第一张桌子,您想要第二张桌子。

To grab a specific table, you can use this: 要获取特定的表,可以使用以下命令:

Element table = doc.select("table").get(1);

This will get the table at index 1, ie the second table in the document. 这将使表位于索引1,即文档中的第二个表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM