简体   繁体   中英

Python server cgi.FieldStorage parsing multipart/form-data

so I have been writing a simple web server in Python, and right now I'm trying to handle multipart/form-data POST requests. I can already handle application/x-www-form-urlencoded POST requests, but the same code won't work for the multipart. If it looks like I am misunderstanding anything, please call me out, even if it's something minor. Also if you guys have any advice on making my code better please let me know as well :) Thanks!

When the request comes in, I first parse it, and split it into a dictionary of headers and a string for the body of the request. I use those to then construct a FieldStorage form, which I can then treat like a dictionary to pull the data out:

requestInfo = ''
while requestInfo[-4:] != '\r\n\r\n':
    requestInfo += conn.recv(1)

requestSplit = requestInfo.split('\r\n')[0].split(' ')
requestType = requestSplit[0]

url = urlparse.urlparse(requestSplit[1])
path = url[2] # Grab Path

if requestType == "POST":
    headers, body = parse_post(conn, requestInfo)

    print "!!!Request!!! " + requestInfo
    print "!!!Body!!! " + body 
    form = cgi.FieldStorage(headers = headers, fp = StringIO(body), environ = {'REQUEST_METHOD':'POST'}, keep_blank_values=1)

Here's my parse_post method:

def parse_post(conn, headers_string):
    headers = {}
    headers_list = headers_string.split('\r\n')

    for i in range(1,len(headers_list)-2):
        header = headers_list[i].split(': ', 1)
        headers[header[0]] = header[1]

    content_length = int(headers['Content-Length'])

    content = conn.recv(content_length)

    # Parse Content differently if it's a multipart request??

    return headers, content

So for an x-www-form-urlencoded POST request, I can treat FieldStorage form like a dictionary, and if I call, for example:

firstname = args['firstname'].value
print firstname

It will work. However, if I instead send a multipart POST request, it ends up printing nothing.

This is the body of the x-www-form-urlencoded request: firstname=TEST&lastname=rwar

This is the body of the multipart request: --070f6a3146974d399d97c85dcf93ed44 Content-Disposition: form-data; name="lastname"; filename="lastname"

rwar --070f6a3146974d399d97c85dcf93ed44 Content-Disposition: form-data; name="firstname"; filename="firstname"

TEST --070f6a3146974d399d97c85dcf93ed44--

So here's the question, should I manually parse the body for the data in parse_post if it's a multipart request?

Or is there a method that I need/can use to parse the multipart body?

Or am I doing this wrong completely?

Thanks again, I know it's a long read but I wanted to make sure my question was comprehensive

So I solved my problem, but in a totally hacky way.

Ended up manually parsing the body of the request, here's the code I wrote:

if("multipart/form-data" in headers["Content-Type"]):
    data_list = []
    content_list = content.split("\r\n\r\n")
    for i in range(len(content_list) - 1):
        data_list.append("")

    data_list[0] += content_list[0].split("name=")[1].split(";")[0].replace('"','') + "="

    for i,c in enumerate(content_list[1:-1]):
        key = c.split("name=")[1].split(";")[0].replace('"','')
        data_list[i+1] += key + "="
        value = c.split("\r\n")
        data_list[i] += value[0]

    data_list[-1] += content_list[-1].split("\r\n")[0]

    content = "&".join(data_list)

If anybody can still solve my problem without having to manually parse the body, please let me know!

There's the streaming-form-data project that provides a Python parser to parse data that's multipart/form-data encoded. It's intended to allow parsing data in chunks, but since there's no chunk size enforced, you could just pass your entire input at once and it should do the job. It should be installable via pip install streaming_form_data .

Here's the source code - https://github.com/siddhantgoel/streaming-form-data

Documentation - https://streaming-form-data.readthedocs.io/en/latest/

Disclaimer: I'm the author. Of course, please create an issue in case you run into a bug. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM