简体   繁体   English

使用C套接字编程获取网页

[英]Fetching a web page using C socket programming

I am trying to build a proxy server in C. My problem is as follows, 我正在尝试在C中构建代理服务器。我的问题如下,

I have a function fetch_response() which connects to example.com and queries the server with and HTTP GET request. 我有一个函数fetch_response(),该函数连接到example.com并使用HTTP GET请求查询服务器。

int fetch_response() {
   int sockfd, portno, n;
    struct sockaddr_in serv_addr;
    struct hostent *server;

    char buffer[4096];
    char *host = "example.com";

    portno = 80;
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) 
        error("ERROR opening socket");
    server = gethostbyname(host);
    if (server == NULL) {
        fprintf(stderr,"ERROR, no such host\n");
        exit(0);
    }
    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    bcopy((char *)server->h_addr, 
         (char *)&serv_addr.sin_addr.s_addr,
         server->h_length);
    serv_addr.sin_port = htons(portno);
    if (connect(sockfd,(struct sockaddr *) &serv_addr,sizeof(serv_addr)) < 0) error("ERROR connecting");
    const char * request = "GET / HTTP/1.0\r\nHost: example.com\r\nConnection: close\r\n\r\n";
    n = write(sockfd,request,strlen(request));
    if (n < 0) error("ERROR writing to socket");
    bzero(buffer,4096);
    n = read(sockfd,buffer,4095);
    if (n < 0) error("ERROR reading from socket");
    printf("%d\n", (int)strlen(buffer));
    printf("%s\n",buffer);
    close(sockfd);
    return 0;
}

It runs fine when tested, for example 例如,它在测试时运行良好

int main() {
    fetch_response();
    return 0;
}

However in my proxy server I am trying to handle multiple client requests, so my main() functions is like, 但是,在我的代理服务器中,我试图处理多个客户端请求,所以我的main()函数就像,

while(1) {
   new_socket = accept(params);
   if(new_socket < 0) error("Error on Connect");
   pid = fork();
   if(pid < 0) error("Error on fork");
   if(pid == 0) {
      fetch_response();
      exit(0);
   }
   else close(new_socket);
}

In this case, I encounter a problem. 在这种情况下,我遇到了问题。 I receive only the first 1328 bytes of the requested page no matter what my buffer size is. 无论我的缓冲区大小是多少,我只会收到请求页面的前1328个字节。 I have tested it with different domains and the result is the same. 我已经用不同的域对其进行了测试,结果是相同的。 For example,in case of example.com, the expected result is, 例如,对于example.com,预期结果是

<html>
<head></head>
<body><h1> Example Domain </h1>
      < Some remaining body here >
</body>
</html>

But instead I get 但是我得到了

<html>
<head></head>
<body><h1> Example Domain </h1>

I cannot understand why this is happening. 我不明白为什么会这样。 Please help. 请帮忙。

Thanks ! 谢谢 !

PS : This is not the actual code of the proxy server. PS:这不是代理服务器的实际代码。 For debugging, I commented everything out and tested the code as above. 对于调试,我注释掉了所有内容并测试了上面的代码。

You need to enclose your reading code in a loop, something like this: 您需要将阅读代码封闭在一个循环中,如下所示:

while (1) {
  bzero(buffer,4096);
  n = recv(sockfd,buffer,4095, 0);
  if (n < 0) {
    error("ERROR reading from socket");
    break;
  }
  if (n == 0) {
    // far end has closed socket
    break;
  }
  // printf("%d\n", (int)strlen(buffer));
  printf("%d\n", n);
  printf("%s\n",buffer);
}

This will keep reading from the socket until the far end closes it. 这将继续从插槽中读取数据,直到远端将其关闭为止。 For each time recv is called, it will return the number of bytes in the buffer. 每次调用recv ,它将返回缓冲区中的字节数。 When it returns 0 , the far end has closed the socket and there is no more to be read. 当它返回0 ,远端已关闭套接字,并且不再有要读取的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM