简体   繁体   中英

How to reload a Java InputStream

I am parsing a webpage using a BufferedReader wrapped around an InputStream in Java:

HttpURLConnection conn = (HttpURLConnection)url.openConnection(myUrl);
InputStream stream = conn.getInputStream();

The problem is that the numbers I want to get from the page are dynamically generated with Ajax calls, and aren't available in the stream. Is there some way I can refresh the stream after waiting a bit, or can anyone think of some other way to get the data? The page is here . The numbers I need are the "Dollar Volume" and "Share Volume" in the middle of the page.

Thanks, Jared

If the numbers are generated with Ajax calls, then your code needs to make those Ajax calls. The only thing akin to "refreshing the stream" is reloading the page, and that's still not going to have the numbers in if they're usually loaded by separate web requests by the Javascript on the client.

Just find what requests the Javascript makes, and make the same requests from your own code.

(You should check that the web site in question is happy for you to scrape their data like this, by the way.)

You can inspect the JavaScript code on the page or use a network sniffer to determine the HTTP requests that the JavaScript code sends back to the server and then you can reproduce them in Java instead of sending the request for the original page.

The most reliable way to achieve this though is to find out whether they offer some API.

You'll either need to reverse engineer the JavaScript or monitor the AJAX calls using a browser extension (ie. FireBug, IE Developer Tools, etc) or a web proxy such as Fiddler, so you can determine:

  • The HTTP target location of the AJAX call(s)
  • The HTTP request method of the AJAX call(s) - ie. POST, GET
  • The result of the AJAX calls - ie. XML, JSON, Text

Once you have the AJAX result, you'll have to parse it to determine your values. There is no simple way to do any of this. I recommended Fiddler, because you can see both raw and specially formatted HTTP data between requests, even over AJAX, SSL, etc.

http://www.fiddler2.com/fiddler2/

http://www.getfirebug.com/

You cannot "refresh" the stream. The negotiation is initated by client that uses AJAX. The client creates HTTP request and the server handles it. You have data access abstraction that named stream and can read it until the stream is over that happens when client has finished sending data or if error happened.

So, read the stream until it ends and parse the content. When client decides to send yet another chunk of data it creates new connection and you (at sever side) get new stream.

I was about to give the same answers as the others here, but after looking at the page, the specific numbers you're looking for aren't returned by AJAX, but are returned within the actual HTML of the page. Here's an example I just grabbed:

<div id="marketTotals">
    <div class="panel">
        <strong>Dollar Volume</strong>
        <span class="value">79,567,751</span>

    </div>
    <div class="panel">
        <strong>Share Volume</strong>
        <span class="value">32,225,173</span>
    </div>
    <div class="panel">
        <strong>Trades</strong>

        <span class="value">6,413</span>
    </div>
    <div class="panel">
        <strong>Advancers</strong>
        <span class="value">60</span>
    </div>
    <div class="panel">

        <strong>Decliners</strong>
        <span class="value">120</span>
    </div>
</div>

There is no special coding needed to get these - especially no reloading of the stream. You can even see these through the use of curl or wget.

All that said, please ensure that you have permission from the web site owner before trying to screen-scrape what is probably proprietary data in this fashion - or it won't be long before you realize you're playing a "cat and mouse game" with barriers and work-arounds. Agreed with the other suggestions - if this is allowed and supported, they will probably have a much more stable API that they can provide you with that is meant for doing just this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM