简体   繁体   English

C HTTP 客户端从响应中剥离标头

[英]C HTTP Client Strip Headers from Response

So, I'm writing this simple HTTP client in C and I seem to be stuck on this problem - how do I strip the HTTP headers from the response?所以,我正在用 C 编写这个简单的 HTTP 客户端,但我似乎遇到了这个问题——如何从响应中去除 HTTP 标头? After all, if I get a binary file I can't just write the headers out to my output file.毕竟,如果我得到一个二进制文件,我不能只将标题写到我的输出文件中。 I can't seem to go in once the data is already written to a file because linux screams when you try to even view the first few lines of a binary file, even if you know they're just text HTTP headers.一旦数据已经写入文件,我似乎无法进入,因为当您甚至尝试查看二进制文件的前几行时,即使您知道它们只是文本 HTTP 标头,linux 也会尖叫。

Now, here's the rub (well, I suppose the whole thing is a rub).现在,问题来了(好吧,我想整件事都是一个问题)。 Sometimes the whole header doesn't even in come in on the first response packet, so I can't even guarantee that we'll have the whole header in our first iteration (that is, iteration of receiving an HTTP response. We're using recv() , here), which means I need to somehow... well, I don't even know.有时整个标头甚至没有出现在第一个响应数据包中,所以我什至不能保证我们在第一次迭代(即接收 HTTP 响应的迭代)中会有整个标头。我们是使用recv() ,这里),这意味着我需要以某种方式......好吧,我什至不知道。 I can't seem to mess with the data once it's already written to disk, so I need to deal with it as it's coming in, but we can't be sure how it's going to come in, and even if we were sure, strtok() is a nightmare to use.一旦数据已经写入磁盘,我似乎就不能乱搞,所以我需要在它进来时处理它,但我们不能确定它会如何进来,即使我们确定, strtok()是使用的噩梦。

I guess I'm just hoping someone out there has a better idea.我想我只是希望有人有更好的主意。 Here's the relevant code.这是相关的代码。 This is really stripped down, I'm going for MCVE, of course.这真的是精简了,当然,我要使用 MCVE。 Also, you can just assume that socket_file_descriptor is already instantiated and get_request contains the text of our GET request.此外,您可以假设socket_file_descriptor已经实例化并且get_request包含我们的 GET 请求的文本。 Here is it:就这个:

FILE* fp = fopen("output", "wb"); // Open the file for writing
char buf[MAXDATASIZE]; // The buffer
size_t numbytes; // For the size of the response

/*
 * Do all the socket programming stuff to get the socket file descriptor that we need
 * ...
 * ...
*/

send(socket_file_descriptor, get_request, strlen(get_request), 0); // Send the HTTP GET request

while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
    /* I either need to do something here, to deal with getting rid of the headers before writing to file */
    fwrite(buf, 1, numbytes, fp); // Write to file
    memset(buf, 0, MAXDATASIZE); // This just resets the buffer to make room for the next packet
}
close(s);
fclose(fp);
/* Or I need to do something here, to strip the file of its headers after it's been written to disk */

So, I thought about doing something like this.所以,我想过做这样的事情。 The only thing we know for sure is that the header is going to end in \\r\\n\\r\\n (two carriage returns).我们唯一确定的是标题将以\\r\\n\\r\\n结尾(两个回车)。 So we can use that.所以我们可以使用它。 This doesn't really work, but hopefully you can figure out where I'm trying to go with it (comments from above removed):这并没有真正起作用,但希望你能弄清楚我想用它去哪里(上面的评论已删除):

FILE* fp = fopen("output", "wb");
char buf[MAXDATASIZE];
size_t numbytes;
int header_found = 0; // Add a flag, here

/* ...
 * ...
*/

send(socket_file_descriptor, get_request, strlen(get_request), 0);

while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
    if (header_found == 1) { // So this won't happen our first pass through
        fwrite(buf, 1, numbytes, fp);
        memset(buf, 0, MAXDATASIZE);
    }
    else { // This will happen our first pass through, maybe our second or third, the header doesn't always come in in full on the first packet
        /* And this is where I'm stuck.
         * I'm thinking about using strtok() to parse through the lines, but....
         * well I just can't figure it out. I'm hoping someone can at least point
         * me in the right direction.
         *
         * The point here would be to somehow determine when we've seen two carriage returns
         * in a row and then mark header_found as 1. But even if we DID manage to find the
         * two carriage returns, we still need to write the remaining data from this packet to 
         * the file before moving on to the next iteration, but WITHOUT including the
         * header information.
        */
    }
}
close(s);
fclose(fp);

I've been staring at this code for three days straight and am slowly losing my mind, so I really appreciate any insight anyone is able to provide.我已经连续三天盯着这段代码,慢慢地失去理智,所以我真的很感激任何人能够提供的任何见解。 To generalize the problem, I guess this really comes down to me just not understanding how to do text parsing in C.概括这个问题,我想这真的归结为我不明白如何在 C 中进行文本解析。

The second self-answer is better than the first one , but it still could be made much simpler:第二个自我回答第一个更好,但仍然可以做得更简单:

const char* pattern = "\r\n\r\n";
const char* patp = pattern;
while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
    for (int i = 0; i < numbytes; i++) {
        if (*patp == 0) {
            fwrite(buf + i, 1, numbytes - i, fp);
            break;
        }
        else if (buf[i] == *patp) ++patp;
        else patp = pattern;
    }
    /* This memset isn't really necessary */
    memset(buf, 0, MAXDATASIZE);
}

That looks like a general solution, but it's not really: there are values for pattern for which it might fail to see a terminator under particular circumstances.这看起来是一个通用的解决方案,但实际上并非如此: pattern某些值在特定情况下可能无法看到终止符。 But this particular pattern is not problematic.但是这种特殊的模式没有问题。 You might want to think about what sort of pattern would cause a problem before taking a look at the more general solution .在查看更通用的解决方案之前,您可能需要考虑哪种模式会导致问题。

So, I know this is not the most elegant way to go about this, but... I did get it.所以,我知道这不是解决这个问题的最优雅的方式,但是......我确实明白了。 For anyone who finds this question and is curious about at least an answer, here it is:对于任何发现这个问题并且对至少一个答案感到好奇的人,这里是:

int count = 0;
int firstr_found = 0;
int firstn_found = 0;
int secondr_found = 0;
int secondn_found = 0;
FILE* fp = fopen("output", "wb");
char buf[MAXDATASIZE];
size_t numbytes;
int header_found = 0;

/* ...
 * ...
*/

send(socket_file_descriptor, get_request, strlen(get_request), 0);

while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
    if (header_found == 1) {
        fwrite(buf, 1, numbytes, fp);
    }
    else {
        // These buf[i]'s are going to return as integers (ASCII)
        // \r is 13 and \n is 10, so we're looking for 13 10 13 10
        // This also needs to be agnostic of which packet # we're on; sometimes the header is split up.
        for (int i = 0; i < numbytes; i++) {
            if (firstr_found == 1 && firstn_found == 1 && secondr_found == 1 && secondn_found == 1) { // WE FOUND IT!
                header_found = 1;
                // We want to skip the parts of the buffer we've already looked at, that's header, and our numbytes will be decreased by that many
                fwrite(buf + i, 1, numbytes - i, fp);
                break;
            }
            
            if (buf[i] == 13 && firstr_found == 0) { // We found our first \r, mark it and move on to next iteration
                firstr_found = 1;
                continue; 
            }
            if (buf[i] == 10 && firstr_found == 1 && firstn_found == 0) { // We found our first \n, mark it and move on
                firstn_found = 1;
                continue; 
            }
            else if (buf[i] != 13 && buf[i] != 10) { // Think about the second r, it'll ignore the first if, but fail on the second if, but we don't want to jump into this else block
                firstr_found = 0;
                firstn_found = 0;
                continue;
            }
            if (buf[i] == 13 && firstr_found == 1 && firstn_found == 1 && secondr_found == 0) {
                secondr_found = 1;
                continue;
            }
            else if (buf[i] != 10) {
                firstr_found = 0;
                firstn_found = 0;
                secondr_found = 0;
                continue;
            }
            if(buf[i] ==  10 && firstr_found == 1 && firstn_found == 1 && secondr_found == 1 && secondn_found == 0) {
                secondn_found = 1;
                continue;
            }
        }
    }
    memset(buf, 0, MAXDATASIZE);
    count++;
}
close(s);
fclose(fp);

Adding another answer because, well I suppose I think I'm clever.添加另一个答案,因为我想我认为我很聪明。 Thanks to @tadman for the idea of a counter.感谢@tadman 提供计数器的想法。 Look here (I'm going to shave off a lot of the bloat and just do the while loop, if you've looked at my other code blocks you should be able to see what I mean here) ...看这里(我将消除很多膨胀,只做 while 循环,如果你看过我的其他代码块,你应该能够明白我在这里的意思)......

/* ...
 * ...
*/
int consec_success = 0;
while ((numbytes = recv(socket_file_descriptor, buf, MAXDATASIZE - 1, 0)) > 0) {
    if (header_found == 1) {
        fwrite(buf, 1, numbytes, fp);
    }
    else {
        for (int i = 0; i < numbytes; i++) {
            if (consec_success == 4) {
                header_found = 1;
                fwrite(buf + i, 1, numbytes - i, fp);
                break;
            }
            
            if (buf[i] == 13 && consec_success % 2 == 0) {
                consec_success++;
            }
            else if (buf[i] == 10 && consec_success % 2 == 1) {
                consec_success++;
            }
            else {
                consec_success = 0;
            }
        }
    }
    memset(buf, 0, MAXDATASIZE);
}
/* ...
 * ...
*/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM