简体   繁体   English

如何解析http标头以获取上传文件并将其保存到磁盘

[英]How to parse http header to get uploaded file and save it to disk

I am developing a http web server in java using socket which gets post header InputStream and then I processed the header with some String split by the header 'boundary' and '\\r\\n' and got all Headers, Cookies in HashMap(s) and got the contents of the file in a String and saved that String to a file on the server. 我正在使用套接字在Java中开发一个HTTP Web服务器,该套接字获取了报头InputStream,然后用一些字符串处理了报头,并按标题“ boundary”和“ \\ r \\ n”分割了所有字符串,并在HashMap中获取了Cookie并以字符串形式获取文件的内容,然后将该字符串保存到服务器上的文件中。 It works fine when I upload text file or java source file to the server but in case of doc, pdf and image it shows corrupted file and corrupted image. 当我将文本文件或Java源文件上传到服务器时,它可以正常工作,但是在doc,pdf和图像的情况下,它显示损坏的文件和损坏的图像。

    PrintWriter out;
        try {
            out = new PrintWriter(new OutputStreamWriter(
                    new FileOutputStream(UploadPath + "\\" + FileName)));
            out.print(FileData);
            out.close();
        } catch (Exception e) {

        }

Above code will save contents of 'FileData' at 'UploadPath' with 'FileName'. 上面的代码将使用'FileName'将'FileData'的内容保存在'UploadPath'中。

In case of jpg or doc file String FileData is having binary contents of the uploaded file which saved by the above code and also I checked both files for their size in bytes and both were having equal size in byte and I also matched contents of the actual file and content FileData String by debugging the application. 如果是jpg或doc文件,则String FileData包含由上述代码保存的上载文件的二进制内容,并且我还检查了两个文件的字节大小,并且两个字节的大小相同,并且我也匹配了实际文件的内容文件和内容FileData String通过调试应用程序。

I also checked actual uploaded image file and the FileData String and both matches byte by byte but the image uploaded is totally corrupted. 我还检查了实际上传的图像文件和FileData String,它们都逐字节匹配,但是上传的图像完全损坏。

After searching on internet for this complete day I am not able to find the solution for this. 在互联网上搜索了整整一天之后,我无法找到解决方案。 Please help. 请帮忙。

I do not want to use apache commons which was suggested on most of the pages. 我不想使用大多数页面上建议的Apache Commons。

If you want to see more codes then I will post them. 如果您想查看更多代码,我将发布它们。

As you are dealing with binary data, you should use byte and OutputStream instead of String and Writer : If you put some bytes in a string, they are decoded 在处理二进制数据时,应使用byteOutputStream而不是StringWriter :如果将某些字节放入字符串中,则它们将被解码

So if you have found the boundaries of the binary data in your request (represented by a byte array), copy the content byte-wise directly to an output stream. 因此,如果您在请求中找到了二进制数据的边界(由字节数组表示),则将内容按字节直接复制到输出流。

This only works, if your request is already completely in memory. 仅当您的请求已完全存储在内存中时,此方法才有效。 Regarding file upload, this is not always possible, because you can run out of memory, if you have large files. 关于文件上传,这并非总是可能的,因为如果文件很大,内存可能会用完。

So the best way to implement a file upload is to read only the next byte from the stream: This is the difference between splitting and parsing . 因此,实现文件上传的最佳方法是仅读取流中的下一个字节 :这是splitparsing之间的区别。 Actually you need a real parser for multipart form data. 实际上,您需要一个真正的解析器来处理多部分表单数据。 Now things get complex, and this is the reason why everybody uses commons-fileupload: It's not that easy to detect the boundaries, if your "look ahead" is just some bytes. 现在事情变得复杂了,这就是每个人都使用commons-fileupload的原因:如果“向前看”只有一些字节,检测边界并不是那么容易。

I had to implement a clean-room implementation for legal reasons. 出于法律原因,我不得不实施无尘室实施。 If that is not your situation, look in the the source of commons-fileupload. 如果那不是您的情况,请查看commons-fileupload的源。 And have a look at the RFC 看看RFC

Since you use Java 7, this is quite easy: use Files.copy() . 由于您使用Java 7,因此非常简单:使用Files.copy()

Also, DO NOT store file contents as String s, those will only ever be valid for text files. 另外,请勿将文件内容存储为String ,这些内容仅对文本文件有效。 Use classical InputStream / OutputStream s to read/write. 使用经典的InputStream / OutputStream进行读取/写入。

You could read it using an array of bytes like the following 您可以使用如下字节数组读取它

InputStream is = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();

int nRead;
byte[] data = new byte[16384];

while ((nRead = is.read(data, 0, data.length)) != -1) {
  buffer.write(data, 0, nRead);
}

buffer.flush();

return buffer.toByteArray();

I solved my problem like this, 我这样解决了我的问题,

    while (inputRequest.available()>0) {
            try {
                int t = inputRequest.read();
                ch = (char) t;
                //here i checked each byte data
            } catch (IOException e) {
            }
    }

Problem was that the input stream was having http header fields along with the file content located anywhere in the stream, so I firstly stored the bytes in a temp String until i get '\\r' and '\\n' in the stream. 问题在于输入流具有http头字段以及位于流中任何位置的文件内容,因此我首先将字节存储在临时字符串中,直到在流中获得“ \\ r”和“ \\ n”为止。 In this way I got the boundary for multipart/form-data HTTP header and then I compared the temp String until I found the boundary and other known header contents and then I sent the input-stream to file output-stream. 这样,我获得了multipart / form-data HTTP头的边界,然后比较了临时字符串,直到找到边界和其他已知的头内容,然后将输入流发送到文件输出流。 But in some cases header may contain other contents after file content so and definitely it will have a ending boundary so I was continuously keeping track of each byte that I have read and then I sent each byte individually to the file output-stream. 但是在某些情况下,标头可能会在文件内容之后包含其他内容,因此肯定会有一个结束边界,因此我一直在跟踪所读取的每个字节,然后将每个字节分别发送到文件输出流。 Here is the sample http header- 这是示例http标头-

   Host: localhost
   User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
   Accept-Language: en-US,en;q=0.5
   Accept-Encoding: gzip, deflate
   DNT: 1
   Referer: http://localhost/index.html
   Connection: keep-alive
   Content-Type: multipart/form-data; boundary=---------------------------274761981030199
   Content-Length: 1405

   -----------------------------274761981030199
   Content-Disposition: form-data; name="name1"

   pppppp
   -----------------------------274761981030199
   Content-Disposition: form-data; name="name2"

   rrrrrrrrr
   -----------------------------274761981030199
   Content-Disposition: form-data; name="name3"

   eeeeeeee
   -----------------------------274761981030199
   Content-Disposition: form-data; name="name4"

   2
   -----------------------------274761981030199
   Content-Disposition: form-data; name="name5"; filename="CgiPost.java"
   Content-Type: text/x-java-source

   import java.io.*;

   // This appears in Core Web Programming from
   // Prentice Hall Publishers, and may be freely used
   // or adapted. 1997 Marty Hall, hall@apl.jhu.edu.


   public class CgiPost extends CgiGet 
   {

   public static void main(String[] args) 
   {

   try 
   {

   DataInputStream in
    = new DataInputStream(System.in);

   String[] data = { in.readLine() };

   CgiPost app = new CgiPost("CgiPost", data, "POST");

   app.printFile();
       } catch(IOException ioe) {
         System.out.println
           ("IOException reading POST data: " + ioe);

   }
     }

     public CgiPost(String name, String[] args,
     String type) {
       super(name, args, type);
     }
   }

   -----------------------------274761981030199
   Content-Disposition: form-data; name="name6"

   pppppppppp
   -----------------------------274761981030199--

NOTE: In some cases there are chances that your application code reaches to inputRequest.available() but the browser haven't sent the request yet, in this case inputRequest.available() will always return 0 and your while loop will exit immediately. 注意:在某些情况下,您的应用程序代码可能会到达inputRequest.available(),但浏览器尚未发送请求,在这种情况下,inputRequest.available()将始终返回0,而while循环将立即退出。 To avoid this first read one byte using inputRequest.read() and then execute code because you can guess the first byte from others in case of http header. 为避免此问题,请先使用inputRequest.read()读取一个字节,然后再执行代码,因为在使用HTTP标头的情况下,您可以从其他字节中猜出第一个字节。

If you are using some count int then use long instead of int, because stream stops in some cases where int variable reaches its limit. 如果使用int计数,则使用long代替int,因为在int变量达到其限制的某些情况下,流停止。

Try to transfer the int value returned from int t = inputRequest.read() to fileoutputstream.write(t). 尝试将从int t = inputRequest.read()返回的int值传输到fileoutputstream.write(t)。

inputRequest.available() keeps decreasing as you are reading byte form inputstream, it returns number of bytes available in the stream. 当您从inputstream读取字节时,inputRequest.available()不断减少,它返回流中可用的字节数。

In this way you can upload files of large size without any corruption in it. 这样,您可以上传大文件而没有任何损坏。

Leave your comment if anyone needs more details about this. 如果有人需要更多详细信息,请发表评论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM