簡體   English   中英

C語言,獲取HTML源碼

[英]C language, get HTML source

我正在嘗試使用C獲取此頁面的HTML http://pastebin.com/raw/7y7MWssc 。到目前為止,我正在嘗試使用套接字和端口80連接到pastebin,然后使用HTTP請求獲取該pastebin頁面上的HTML。

我知道到目前為止我可能還差得遠,但是這里是:

#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>

int main()
{
    /*Define socket variables */
    char host[1024] = "pastebin.com";
    char url[1024] = "/raw/7y7MWssc";
    char request[2000];
    struct hostent *server;
    struct sockaddr_in serverAddr;
    int portno = 80;

    printf("Trying to get source of pastebin.com/raw/7y7MWssc ...\n");

    /* Create socket */
    int tcpSocket = socket(AF_INET, SOCK_STREAM, 0);
    if(tcpSocket < 0) {
        printf("ERROR opening socket\n");
    } else {
        printf("Socket opened successfully.\n");
    }

    server = gethostbyname(host);
    serverAddr.sin_port = htons(portno);
    if(connect(tcpSocket, (struct sockaddr *) &serverAddr, sizeof(serverAddr)) < 0) {
        printf("Can't connect\n");
    } else {
        printf("Connected successfully\n");
    }

    bzero(request, 2000);
    sprintf(request, "Get %s HTTP/1.1\r\n Host: %s\r\n \r\n \r\n", url, host);
    printf("\n%s", request);

    if(send(tcpSocket, request, strlen(request), 0) < 0) {
        printf("Error with send()");
    } else {
        printf("Successfully sent html fetch request");
    }
    printf("test\n");

}

上面的代碼在一定程度上說得通,現在我很困惑。 我如何才能從http://pastebin.com/raw/7y7MWssc獲得Web來源?

固定,我需要設置添加

serverAddr.sin_family = AF_INET;

和bzero serverAddr,還有我的HTTP請求是錯誤的,它有一個額外的/ r / n和空格,如@immibis所說。

已更正:

sprintf(request, "GET %s HTTP/1.1\r\nHost: %s\r\n\r\n", url, host);

您正在獲取由gethostbyname()返回的指針,但並未對其進行任何操作。

您需要使用地址,域和端口填充sockaddr_in。

這行得通...但是現在您需要擔心獲得響應...

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>

int main()
{
    /*Define socket variables */
    char host[1024] = "pastebin.com";
    char url[1024] = "/raw/7y7MWssc";
    char request[2000];
    struct hostent *server;
    struct sockaddr_in serverAddr;
    short portno = 80;

    printf("Trying to get source of pastebin.com/raw/7y7MWssc ...\n");

    /* Create socket */
    int tcpSocket = socket(AF_INET, SOCK_STREAM, 0);
    if(tcpSocket < 0) {
        printf("ERROR opening socket\n");
        exit(-1);
    } else {
        printf("Socket opened successfully.\n");
    }

    if ((server = gethostbyname(host)) == NULL) {
        fprintf(stderr, "gethostbybname(): error");
        exit(-1);
    }

    memcpy(&serverAddr.sin_addr, server -> h_addr_list[0], server -> h_length);
    serverAddr.sin_family = AF_INET;
    serverAddr.sin_port = htons(portno);

    if(connect(tcpSocket, (struct sockaddr *) &serverAddr, sizeof(serverAddr)) < 0) {
        printf("Can't connect\n");
        exit(-1);
    } else {
        printf("Connected successfully\n");
    }

    bzero(request, 2000);
    sprintf(request, "Get %s HTTP/1.1\r\n Host: %s\r\n \r\n \r\n", url, host);
    printf("\n%s", request);

    if(send(tcpSocket, request, strlen(request), 0) < 0) {
        printf("Error with send()");
    } else {
        printf("Successfully sent html fetch request");
    }
    printf("test\n");

}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM