简体   繁体   English

解析HTTP GET-C

[英]Parsing HTTP GET - C

I'm trying to parse an HTTP GET request, and have been doing so using strtok() , however I am running into problems when trying to use strcpy() . 我正在尝试解析HTTP GET请求,并一直使用strtok()这样做,但是在尝试使用strcpy()时遇到了问题。

I can parse the file path and file name fine, but can't seem to parse the remote host DNS name. 我可以很好地解析文件路径和文件名,但似乎无法解析远程主机DNS名称。 Below is code that should tokenize a string and get the DNS name, then store it in a char[] called host . 下面的代码应标记一个字符串并获取DNS名称,然后将其存储在名为hostchar[]

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>

int main()
{
        int c = 0, c2 = 0;
        char *tk, *tk2, *tk3, *tk4;
        char buf[64], buf2[64], buf3[64], buf4[64];
        char host[1024], path[64], file[64];

        strcpy(buf, "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n");

        tk = strtok(buf, "\r\n");
        while(tk != NULL)
        {
                if(c == 1)
                {
                        tk2 = strtok(tk, " ");
                        while(tk2 != NULL)
                        {
                                if(c2 == 1)
                                {
                                        printf("%s\n", tk2);
                                        strcpy(host, tk2);
                                //      printf("%s\n", host);
                                }
                                ++c2;
                                tk2 = strtok(NULL, " ");
                        }
                }
                ++c;
                tk = strtok(NULL, "\r\n");
        }

        return 0;
}

Bear with me, as I'm a new C programmer and this code may be ugly. 请耐心等待,因为我是一名新C程序员,所以这段代码可能很难看。 Every time I try running my program, I get a Segmentation fault (core dumped) error, and I believe it has something to do with strcpy() . 每次尝试运行程序时,都会出现“ Segmentation fault (core dumped)错误,并且我相信这与strcpy() I can print out the tokenized string just fine, but can't seem to copy it to a char[] . 我可以很好地打印出标记化的字符串,但似乎无法将其复制到char[]

Sorry, but strtok(3) function is not good to parse HTTP at all. 抱歉, strtok(3)函数根本无法解析HTTP。 Despite of this, I'll try to explain what's happening in your code. 尽管如此,我将尝试解释您的代码中发生了什么。

  1. The first time, you enter the loop with tk=="GET /~yourloginid/index.html HTTP/1.1" , and your buffer has been changed to "GET /~yourloginid/index.htm HTTP/1.1\\0\\nHost: ..." . 第一次,使用tk=="GET /~yourloginid/index.html HTTP/1.1"进入循环,并且缓冲区已更改为"GET /~yourloginid/index.htm HTTP/1.1\\0\\nHost: ..." As c==0 , you won't get into the if block, so you'll get your c variable incremented and tk=strtok(NULL, "\\r\\n"); 由于c==0 ,所以您不会进入if块,因此将使c变量递增,并且tk=strtok(NULL, "\\r\\n"); called again to get the second line. 再次调用以获得第二行。
  2. The second time, you enter the loop with tk=="Host: remote.cba.scuohio.edu\\r\\n..." , as strtok(3) jumped over the first \\0 in your string, skipped all \\r and \\n characters, and got that (strtok has put now a second \\0 after that part, leading to tk=="Host: remote.cba.scuohio.edu\\0\\n..." . As c==1 this time, you get inside the if block and call strtok(tk, " "); . That makes strtok(3) to forget the extent of the string you where parsing, and begin a new parse on Host: remote.cba.csuohio.edu" (as you passed it a first non-null argument), it will return tk=="Host:" , putting a \\0 after "host:" . The second time you enter the inner loop, you copy that value to host variable. 第二次,您使用tk=="Host: remote.cba.scuohio.edu\\r\\n..."进入循环,因为strtok(3)跳过了字符串的第一个\\0 ,跳过了所有\\r\\n字符,并且得到了(strtok现在在该部分之后放了第二个\\0 ,导致tk=="Host: remote.cba.scuohio.edu\\0\\n..." 。当c==1这次,您进入了if块,并调用strtok(tk, " "); ;,这使strtok(3) 忘记了要解析的字符串的范围,并在Host: remote.cba.csuohio.edu"上开始了新的解析Host: remote.cba.csuohio.edu" (当您向其传递第一个非null参数时),它将返回tk=="Host:" ,在"host:"后放置\\0第二次进入内部循环时,您将复制该值host变量。
  3. The third time you enter the main loop, you have tk==NULL as the last time you called tk=strtok(NULL, " "); 第三次进入主循环时,您将tk==NULL作为上次调用tk=strtok(NULL, " "); it returned NULL (in the inner loop), strtok will continue returning NULL until you initialize it again, passing a first non-null argument. 它返回了NULL (在内部循环中),strtok将继续返回NULL直到您再次初始化它,并传递第一个非null参数。

strtok(3) operates on the string you passed as the first parameter (writing info on it) and modifies it. strtok(3)对作为第一个参数(在其上写入信息)传递的字符串进行操作并对其进行修改。 Further, it has a global hidden variable to mark the end of the string you are parsing, to be able to return NULL when it is finished parsing. 此外,它还有一个全局隐藏变量来标记您要解析的字符串的结尾,以便在解析完成后能够返回NULL If you nest calls to strtok(3) you will get undefined behaviour, as you loose the internal state of the function when you initialize it again, passing a non-null first parameter. 如果将对strtok(3)的调用嵌套,您将获得未定义的行为,因为当您再次初始化函数时,如果传递第一个非null参数,则会释放该函数的内部状态。 That is the reason of your fail. 那就是你失败的原因。

Calling strtok(3) has numerous drawbacks and it cannot be nested in several nested loops as it stores internally state related to what you are parsing. 调用strtok(3)有很多弊端,它不能嵌套在多个嵌套循环中,因为它存储与解析内容有关的内部状态。 It's strongly deprecated use. 强烈建议不要使用它。 If you want it to be nestable, you have to switch to strtok_r(3) instead. 如果希望它可嵌套,则必须切换到strtok_r(3) That function has an extra parameter that allows you to save externally the strtok internal state, so you can have several strtoks working in parallel. 该函数有一个额外的参数,可让您在外部保存strtok的内部状态,因此您可以使多个strtok并行工作。

Further, strtok will parse ok "GET_/~yourlogin..." than "GET___/~yourlogin..." (I have used underscores to represent spaces to show multiple spaces between method name and uri) and the latter is not permitted by HTTP. 此外,strtok将解析好"GET_/~yourlogin..."不是"GET___/~yourlogin..." (我已使用下划线表示空格,以显示方法名和uri之间的多个空格),并且后者不允许HTTP。 For the same reason, you can get "Host:remote.cba.csuohio.edu" as a valid header field (however, this use is strongly discouraged) and you will not parse correctly that. 出于同样的原因,您可以获得"Host:remote.cba.csuohio.edu"作为有效的头字段(但是,强烈建议不要使用此方法),并且您将无法正确解析该字段。 Also, the Host: header field might not be the first line in the HTTP header, so you can skip it if you are not carefull. 另外,“ Host:标头字段可能不是HTTP标头的第一行,因此如果您不小心,可以跳过它。

If you want to parse HTTP, the first reading I can recommend is the RFC-2616, "HyperText Transfer Protocol - HTTP/1.1" , which is the mandatory document to comply with for implementors. 如果要解析HTTP,我可以推荐的第一个阅读材料是RFC-2616“超文本传输​​协议-HTTP / 1.1” ,这是实现者必须遵循的强制性文件。 Beware, it's a dense and large document. 当心,这是一个密集而庞大的文档。

在我看来,您要复制到buf中的字符串比在buf的定义语句中分配的64个字符长。

Trekator is right. Trekator是对的。

Replace this: 替换为:

strcpy(buf, "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n");

with: 有:

char buf[] = "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.csuohio.edu\r\n\r\n";

It is good practice to use strncpy if there is a possibility of overrunning the destination buffer. 如果可能会溢出目标缓冲区,则最好使用strncpy

Some advice: 一些忠告:

(1) You will find your life a whole lot easier if you use variable names which tell you more than 'tk2' and such. (1)如果您使用变量名比'tk2'等更多信息,您会发现自己的生活要容易得多。

(2) You can use strstr() to find the cr-nl w/o modifying the original string and it doesn't have the nesting issues that strtok() does. (2)您可以使用strstr()查找不修改原始字符串的cr-nl,并且它没有strtok()的嵌套问题。

(3) In general you can't count on Host: always being the 2nd line, so a more general approach of looking for it is more robust. (3)通常,您不能指望Host:始终是第二行,因此寻找它的更通用方法更可靠。

(4) Making sure things fit into their destination is "safe programming 101". (4)确保事物适合其目的地是“安全编程101”。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

int main()
{
        int     lnum    = 0;
        char buf[] = "GET /~yourloginid/index.htm HTTP/1.1\r\nHost: remote.cba.c
suohio.edu\r\n\r\n";
        char *  line;
        char *  crnl;
        char *  colon;
        char *  arg;
        char    host[1024]      = "(not found)";

        for (line = buf;
             (crnl = strstr(line, "\r\n")) != NULL;
            line = crnl + 2) {
                ++lnum;
                if (lnum == 1) {
                        /* "GET" line is always first */
                        continue;
                }
                if (crnl == line) {             /* empty line marks end */
                        break;
                }
                if (((colon = strchr(line, ':')) == NULL) || (colon > crnl)) {
                        fprintf(stderr, "no colon in header line?\n");
                        break;
                }
                if (strncasecmp(line, "Host", colon - line) == 0) {
                        for (arg = colon + 1; isspace(*arg); ++arg) {}
                        if ((crnl - arg) >= sizeof(host)) {
                                fprintf(stderr, "hostname too big\n");
                        } else {
                                strncpy(host, arg, crnl - arg);
                                host[crnl - arg] = '\0';
                        }
                }
        }
        printf("host was '%s'\n", host);
        return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM