简体   繁体   中英

How to get correct HTTP response in JAVA?

I want get some data at a web page, so I use java to send a http request to server

I have tried URLConnection and Jsoup, but they both cannot get the correct response

If browse the url at browser

http://www.hkprinters.org/en/member_search.asp?page=1&mode=view

the response is correct, the search result is obtained

but use java, I can only get the search, no result.

Why the response is incorrect and how to get the correct response?

import java.io.*;
import java.util.*;
import java.net.*;
import org.json.*;

class HttpRequest
{
    public static void main(String[] args) throws Exception
    {
        URL url = new URL("http://www.hkprinters.org/en/member_search.asp?page=1&mode=view");
        URLConnection conn = url.openConnection();
        conn.setDoOutput(true);
        OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
        wr.flush();

        BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("station.txt")));

        String line;
        while((line=rd.readLine())!=null)
        {
            out.write(line);
        }
        out.close();
    }


}






import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.jsoup.*;

public class read_line2 {

    public static void main(String args[]) {
        try {
            Document doc = Jsoup.connect("http://www.hkprinters.org/en/member_search.asp?page=1&mode=view").get();
            Document doc = Jsoup.parse(input, null);
            Elements newHeadlines = doc.select("*");
            System.out.println(newHeadlines);

        } catch (Exception e) {
        }
    }
}

Update:

I want explain the correct and incorrect result first.

The correct is search form + search result data (such as Company name, address, tel), I want these data.

The incorrect is:

<title>db</title>
<title>func</title>
<!DOCTYPE HTML PUBLIC
........
<input type="hidden" name="hdnMode" value="search"/></form>
</table>
<font size="2"><br/>

if you use browser to see, you can only see the search form, no result.

The new finding is: I can use browser to get the incorrect result now. if you close the browser and open again, and then browse http://www.hkprinters.org/en/member_search.asp?page=1&mode=view

then you will get incorrect result, and this result is completely same to JAVA result

<title>db</title>
<title>func</title>
<!DOCTYPE HTML PUBLIC
........
<input type="hidden" name="hdnMode" value="search"/></form>
</table>
<font size="2"><br/>

now, if you can click the submit (not need input anything), then search result will be shown again, now even you only browser http://www.hkprinters.org/en/member_search.asp?page=1&mode=view (get method), the search result still be shown.

so I guess this page save post data to session when first time I click submit button, after that, every time I browse this page, it find the search key from session, so even I use get method to send page and mode, it still give me the search result.

but I don't know how to achieve the same session using JAVA, any example for this?

If you are not sending anything in request then comment the following lines :

conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.flush();

I suggest using Apache http client.
You will have better control of which HTTP method you're using (GET,PUT, etc...)
This HTTP client is widely used.
You'll have better API for handling the response (it is possible of course with URLConnection, but this framework simplifies things.

尝试使用java.net.HttpURLConnection而不是URLConnection。

I inspected the source code for the provided URL. It has some mistakes in the HTML markup. It can be in some browsers the reason why a form is not submmited. It depends on how your browser is lenient with bad markup. For instance the element is defined between /tr and tr elements, it means inside a table:

...
</tr>
<form action="member_search.asp" method="post" name="frmSearch" 
    onSubmit="return checkSearchForm();">
<tr class="copy"> 
...

I can see also that the method used for submit is a POST, but I don't see in your code any setting to provide search parameters as shown in the search form.

My advise is that you try to check your client doing a request to a different page that you can certify that is well generated.

Call HttpURLConnection.getResponseCode() after you write, if you need to write anything, which seems dubious, but before you read anything, if you really need to read anything, which may also be dubious. If you just do I/O you are at the mercy of some HTTP status codes being mapped to IOExceptions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM