简体   繁体   English

在Java中对HTTP请求使用writeUTF和readUTF

[英]Use writeUTF and readUTF for http requests in Java

This is aa Java method that tries to crawl a designated web page. 这是一种Java方法,尝试爬网指定的网页。 I am using writeUTF and readUTF for socket communications to a server. 我正在使用writeUTF和readUTF进行与服务器的套接字通信。

static void get_html(String host, String page, int port) throws IOException {
        Socket sock = new Socket(host, port);
        String msg = MessageFormat.format("GET {0} HTTP/1.1\r\nHost: {1}\r\n\r\n", page, host);

        DataOutputStream outToServer = new DataOutputStream(sock.getOutputStream());
        DataInputStream inFromServer = new DataInputStream(sock.getInputStream());

        InputStream stream = new ByteArrayInputStream(msg.getBytes(StandardCharsets.UTF_8));
        BufferedReader buf = new BufferedReader(new InputStreamReader(stream));
        String outMsg;

        while ((outMsg = buf.readLine()) != null) {
            System.out.println("Sending message: " + outMsg);
            outToServer.writeUTF(outMsg);

            String inMsg;
            try {
                inMsg = inFromServer.readUTF();
            } catch (EOFException eof) {
                break;
            }
            System.out.println(inMsg);
        }
        sock.close();
    }

The reason I am writing it this way was to mimic the c code, where you have a while loop of send() making all deliveries from a buffer, and another while loop of recv() from a buffer untill it hits 'null'. 我用这种方式编写代码的原因是模仿c代码,在该代码中,有一个send()的while循环从缓冲区进行所有传递,另一个recv() while循环从缓冲区进行直到它到达'null'。 When execute my code, it just hangs there, I suspect that is due to a call of readUTF before I finished sending all my messages. 当执行我的代码时,它只是挂在那里,我怀疑这是由于在我完成发送所有消息之前调用readUTF所致。 If this is the case, is there any way to fix it? 如果是这样,有什么办法可以解决?

You can't do this. 你做不到 HTTP is defined as text lines. HTTP定义为文本行。 writeUTF() does not write text, it writes a special format starting with a 16-bit binary length word. writeUTF()不写文本,它以16位二进制长度的字开始写入特殊格式。 Similarly the HTTP server won't reply with that format into your readUTF() call. 同样,HTTP服务器不会以这种格式回复到您的readUTF()调用中。 See the Javadoc. 请参阅Javadoc。

You have to use binary streams and the write() method, with \\r\\n as the line terminator. 您必须使用二进制流和write()方法,并将\\r\\n作为行终止符。 Depending on the output format you may or may not be able to use readLine() . 根据输出格式,您可能无法使用readLine() Best not, then you don't have to write two pieces of code: use binary streams again. 最好不要,那么您不必编写两段代码:再次使用二进制流。

In fact you should throw it all away and use HttpURLConnection . 实际上,您应该将其全部扔掉并使用HttpURLConnection Implementing HTTP is not as simple as may hastily be supposed. 实施HTTP并不像想像中的那么简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM