How to extract file data from an HTTP MIME-encoded message in Linux?

Question

I have a program that accepts HTTP post of files and write all the POST result into a file, I want to write a script to delete the HTTP headers, only leave the binary file data, how to do it?

The file content is below (the data between Content-Type: application/octet-stream and ------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3 is what I want:

POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-

Answer 1

如果您使用Python， email.parser.Parser将允许您解析多部分的MIME文档。

Answer 2

You want to do this as the file is going over, or is this something you want to do after the file comes over?

Almost any scripting language should work. My AWK is a bit rusty, but...

awk '/^Content-Type: application\/octet-stream/,/^--------/'

That should print everything between application/octet-stream and the ---------- lines. It might also include both those lines too which means you'll have to do something a bit more complex:

BEGIN {state = 0}
{
    if ($0 ~ /^------------/) {
        state = 0;
    }
    if (state == 1) {
        print $0
    }
    if ($0 ~ /^Content-Type: application\/octet-stream/) {
        state = 1;
    }
}

The application\\/octet-stream line is after the print statement because you want to set state to 1 after you see application/octet-stream .

Of course, being Unix, you could pipe the output of your program through awk and then save the file.

Answer 3

这可能是一个疯狂的主意，但我会尝试使用procmail剥离标头。

Answer 4

Look at the Mime::Tools suite for Perl. It has a rich set of classes; I'm sure you could put something together in just a few lines.

Answer 5

This probably contains some typos or something, but bear with me anyway. First determine the boundary ( input is the file containing the data - pipe if necessary):

boundary=`grep '^Content-Type: multipart/form-data; boundary=' input|sed 's/.*boundary=//'`

Then filter the Filedata part:

fd='Content-Disposition: form-data; name="Filedata"'
sed -n "/$fd/,/$boundary/p"

The last part is filter a few extra lines - header lines before and including the empty line and the boundary itself, so change the last line from previous to:

sed -n "/$fd/,/$boundary/p" | sed '1,/^$/d' | sed '$d'

sed -n "/$fd/,/$boundary/p" filters the lines between the Filedata header and the boundary (inclusive),
sed '1,/^$/d' is deleting everything up to and including the first line (so removes the headers) and
sed '$d' removes the last line (the boundary).

After this, you wait for Dennis (see comments) to optimize it and you get this:

sed "1,/$fd/d;/^$/d;/$boundary/,$d"

Now that you've come here, scratch all this and do what Ignacio suggested. Reason - this probably won't work (reliably) for this, as GIF is binary data.

Ah, it was a good exercise! Anyway, for the lovers of sed , here's the excellent page:

http://sed.sourceforge.net/sed1line.txt

Outstanding information.

How to extract file data from an HTTP MIME-encoded message in Linux?

Question

5 answers

solution1
2 2010-11-21 01:59:59

solution2
2 ACCPTED 2010-11-21 02:27:57

solution3
1 2010-11-21 02:00:48

solution4
1 2010-11-21 02:22:43

solution5
0 2010-11-21 02:14:51

How to extract file data from an HTTP MIME-encoded message in Linux?

Question

5 answers

solution1 2 2010-11-21 01:59:59

solution2 2 ACCPTED 2010-11-21 02:27:57

solution3 1 2010-11-21 02:00:48

solution4 1 2010-11-21 02:22:43

solution5 0 2010-11-21 02:14:51

solution1
2 2010-11-21 01:59:59

solution2
2 ACCPTED 2010-11-21 02:27:57

solution3
1 2010-11-21 02:00:48

solution4
1 2010-11-21 02:22:43

solution5
0 2010-11-21 02:14:51