简体   繁体   English

解析HTTP标头

[英]Parsing HTTP Headers

I've had a new found interest in building a small, efficient web server in C and have had some trouble parsing POST methods from the HTTP Header. 我有兴趣在C中构建一个小型,高效的Web服务器,并且在从HTTP Header解析POST方法时遇到了一些麻烦。 Would anyone have any advice as to how to handle retrieving the name/value pairs from the "posted" data? 有人会对如何处理从“已发布”数据中检索名称/值对有任何建议吗?

POST /test HTTP/1.1
Host: test-domain.com:7017
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://test-domain.com:7017/index.html
Cookie: __utma=43166241.217413299.1220726314.1221171690.1221200181.16; __utmz=43166241.1220726314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
Cache-Control: max-age=0
Content-Type: application/x-www-form-urlencoded
Content-Length: 25

field1=asfd&field2=a3f3f3
// ^-this

I see no tangible way to retrieve the bottom line as a whole and ensure that it works every time. 我认为没有切实可行的方法来检索整体的底线并确保它每次都有效。 I'm not a fan of hard-coding in anything. 我不喜欢硬编码。

You can retrieve the name/value pairs by searching for newline newline or more specifically \\r\\n\\r\\n (after this, the body of the message will start). 您可以通过搜索换行换行符或更具体地\\ r \\ n \\ r \\ n来检索名称/值对(在此之后,将启动消息正文)。

Then you can simply split the list by the &, and then split each of those returned strings between the = for name/value pairs. 然后你可以简单地用&拆分列表,然后在= for name / value对之间拆分每个返回的字符串。

See the HTTP 1.1 RFC . 请参阅HTTP 1.1 RFC

Once you have Content-Length in the header, you know the amount of bytes to be read right after the blank line. 一旦在标题中有Content-Length,就会知道在空白行之后要读取的字节数。 If, for any reason (GET or POST) Content-Length is not in the header, it means there's nothing to read after the blank line (crlf). 如果由于任何原因(GET或POST)Content-Length不在标题中,则表示在空白行(crlf)之后没有任何内容可读。

You need to keep parsing the stream as headers until you see the blank line. 您需要将流解析为标题,直到看到空白行。 The rest is the POST data. 其余的是POST数据。

You need to write a little parser for the post data. 您需要为post数据编写一个小解析器。 You can use C library routines to do something quick and dirty, like index, strtok, and sscanf. 您可以使用C库例程来执行快速和脏的操作,例如index,strtok和sscanf。 If you have room for it in your definition of "small", you could do something more elaborate with a regular expression library, or even with flex and bison. 如果你在“小”的定义中有空间,你可以使用正则表达式库,甚至是flex和bison来做更精细的事情。

At least, I think this kind of answers your question. 至少,我认为这样可以回答你的问题。

IETF RFC notwithstanding, here is a more to the point answer. IETF RFC尽管如此,这里有一个更重要的答案。 Assuming that you realize that there is always an extra /r/n after the Content-Length line in the header, you should be able to do the work to isolate it into a char* variable named data . 假设你意识到在标题中的Content-Length行之后总是有一个额外的/r/n ,你应该能够把它分离成一个名为datachar*变量。 This is where we start. 这是我们开始的地方。

char *data = "f1=asfd&f2=a3f3f3";
char f1[100], 
char f2[100];
sscanf(data, "%s&%s", &f1, &f2); // get the field tuples

char f1_name[50];
char f1_data[50];
sscanf(f1, "%s=%s", f1_name, f1_data);  

char f2_name[50];
char f2_data[50];
sscanf(f2, "%s=%s", f2_name, f2_data);  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM