简体   繁体   English

以多种格式从InputStream读取

[英]Read from InputStream in multiple formats

I'm trying to write a class that reads HTTP requests and responses and parses them. 我正在尝试编写一个读取HTTP请求和响应并解析它们的类。 Since the headers are ordinary text it seemed easiest to read them using a BufferedReader and the readLine method. 由于标题是普通文本,因此使用BufferedReaderreadLine方法读取它们似乎最容易。 This obviously won't do for the data body as it may be binary, so I want to switch over to read raw bytes after the headers have been read. 这显然不会对数据体造成影响,因为它可能是二进制的,所以我想在读取头文件后切换到读取原始字节。

Right now, I'm doing something like this: 现在,我正在做这样的事情:

InputStream input=socket.getInputStream();
BufferedReader reader=new BufferedReader(new InputStreamReader(input));
BufferedInputStream binstream=new BufferedInputStream(input);

The problem is that the BufferedReader is reading ahead and gobbling up all the binary data from the stream before I have a chance to get at it with the binstream. 问题是BufferedReader正在提前读取并吞噬流中的所有二进制数据,然后才有机会通过binstream获取它。

Is there a way to prevent it from reading beyond the newline for each call to readLine ? 有没有办法阻止它在每次调用readLine时超出换行符读取? Or is there a better way to read single lines of ASCII text followed raw binary data? 或者是否有更好的方法来读取原始二进制数据后的单行ASCII文本?

There is already a class in Java for handling HTTP requests and responses. Java中已经有一个用于处理HTTP请求和响应的类。 You should use that instead of trying to parse the response on your own. 您应该使用它而不是尝试自己解析响应。 Parsing HTTP response is more difficult than you think as there are different encoding methods that you have to deal with. 解析HTTP响应比您想象的更困难,因为您必须处理不同的编码方法。 It isn't really raw binary data in the response payload. 它不是响应有效负载中的原始二进制数据。 The HttpURLConnection class will parse headers for you and give you InputStream for the payload. HttpURLConnection类将为您解析标头,并为有效负载提供InputStream。

http://download.oracle.com/javase/1.4.2/docs/api/java/net/HttpURLConnection.html http://download.oracle.com/javase/1.4.2/docs/api/java/net/HttpURLConnection.html

If you don't want to use a ready HTTP client/server implementation like Konstantin proposed, DataInputStream has a readLine method. 如果您不想使用像Konstantin建议的现成HTTP客户端/服务器实现,DataInputStream具有readLine方法。 It is deprecated since it isn't doing a proper conversion (mostly a direct byte -> char casting conversion), but I think for pure ASCII header lines you should be good. 它已被弃用,因为它没有进行正确的转换(主要是直接字节 - > char转换),但我认为对于纯ASCII标题行,你应该是好的。

(You should put a BufferedInputStream under you DataInputStream, since readLine reads each byte individually.) (您应该在DataInputStream下放置一个BufferedInputStream,因为readLine会单独读取每个字节。)

commons-httpclient可能会为您节省大量工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM