简体   繁体   English

使用Java的URLConnection出现问题

[英]An issue with an URLConnection using java

I'm trying to read out the code of a website. 我正在尝试读出网站的代码。 But there is an issue if I want to receive the code of this site for example: " https://www.amazon.de/gp/bestsellers/pet-supplies/#2 " I tried a lot, but still im just receiving the code of https://www.amazon.de/gp/bestsellers/pet-supplies ". So something does not work right as I want to receive place 21-40 and not 1-20. I'm using an URLConneciton and a BufferedReader: 但是,如果我想接收该站点的代码,就会出现问题:“ https://www.amazon.de/gp/bestsellers/pet-supplies/#2 ”我尝试了很多,但仍然只是收到https://www.amazon.de/gp/bestsellers/pet-supplies的代码”。因此,某些事情行不通,因为我想接收21-40而不是1-20的位置。我使用的是URLConneciton和BufferedReader:

public String fetchPage(String urlS){       
    String s = null;
    String qc = null;

    try{
    URL url = new URL(urlS);
    URLConnection uc = url.openConnection();
    uc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0");

    BufferedReader reader = new BufferedReader(new InputStreamReader(uc.getInputStream()));


    while((s = reader.readLine()) != null){
        qc += s;
    }
    reader.close();
    } catch(IOException e) {            
        e.printStackTrace();
        qc = "receiving qc failed";
    }
    return qc;
}

Thank you in advance for your effort :) 预先感谢您的努力:)

The URL you're fetching, contains an achor (the #2 at the end). 您要获取的URL包含一个achor(末尾为#2)。 An anchor is a client-side concept and is originally used to jump to a certain part of the page. 锚是客户端概念,最初用于跳转到页面的特定部分。 Some webapps (mostly single-page apps) use the anchor to keep track of some sort of state (eg. what page of products you're viewing). 某些Web应用程序(主要是单页应用程序)使用定位符来跟踪某种状态(例如,您正在查看的产品页面)。

Since the anchor is a client side concept, the responding webserver (or your browser/HTTP client library) just drops any anchors as if you actually requested https://www.amazon.de/gp/bestsellers/pet-supplies . 由于锚是客户端概念,因此响应的网络服务器(或浏览器/ HTTP客户端库)仅会丢弃所有锚 ,就好像您实际请求了https://www.amazon.de/gp/bestsellers/pet-supplies

Bottom line is that you'll never get the second page... Goog luck in scraping Amazon though ;) 最重要的是,您将永远不会获得第二页。。。虽然刮刮亚马逊,但运气不错;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM