简体   繁体   English

Winsock recv给出乱码和有用的html

[英]Winsock recv giving gibberish mixed with useful html

I'm trying to get the html source of a webpage www.chemguide.co.uk (it has pages that aren't miles long) using winsock implemented in c++. 我正在尝试使用c ++中实现的winsock获取网页www.chemguide.co.uk的html源(它的页面长度不长)。 Most of the data that comes through is good, but at certain points in the output there is a certain character (it looks like |¦ on the console and some sort of I on here) being repeated, I think in groups of 8, and there are some other strange characters as well. 传递的大多数数据都是好的,但是在输出的某些点上有一个重复的字符(在控制台上看起来像| ¦,这里有某种I),我认为每8个一组,还有其他一些奇怪的角色。

Also, some of the document seems to be printed after the end of the page (the tag. Here's the code: 另外,某些文档似乎在页面结尾之后打印(标签。这是代码:

// Portprog.cpp : Defines the entry point for the console application.
//


#include "stdafx.h"
#include <winsock2.h>
#include <sys/types.h>
#include <stdio.h>
#include <iostream>
#include <string>
#include <fstream>


#pragma comment(lib, "ws2_32.lib") //Winsock library

int getHTML(std::string *result)
{
    WSADATA wsa;
    SOCKET s;
    SOCKADDR_IN server;
    using std::string;
    using std::cout;
    using std::endl;

    cout << "Initialising Winsock...";
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0)
    {
        cout << "Failed. Error Code: " << WSAGetLastError();
        return 1;
    }
    cout << "Winsock initialised." << endl;

    if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
    {
        cout << "Could not create socket: " << WSAGetLastError() << endl;
        return 1;
    }
    cout << "Socket created." << endl;

    server.sin_addr.s_addr = inet_addr("217.27.240.124");
    server.sin_family = AF_INET;
    server.sin_port = htons(80); //host to network endian short

    //Connect to remote server
    if (connect(s, (SOCKADDR *)&server, sizeof(server)) < 0)
    {
        cout << "Connection failed." << endl;
        return 1;
    }
    cout << "Connected." << endl;

    //Send some data
    string srequest = "GET / HTTP/1.1\r\n";
    srequest += "Host: chemguide.co.uk\r\n";
    srequest += "Connection: close\r\n";
    srequest += "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n";
    srequest += "\r\n";

    char crequest[10000];
    int requestSize = srequest.length() + 1;
    strncpy_s(crequest, srequest.c_str(), requestSize);

    if (send(s, crequest, requestSize, 0) < 0)
    {
        cout << "Data could not be sent." << endl;
        return 1;
    }
    cout << "Data sent." << endl;

    //Receive a reply from the server
    std::string server_reply = "";
    int recv_length;
    char buffer[1000];
    int i = 0;
    do
    {
        i = recv_length = recv(s, buffer, sizeof(buffer), 0);
        server_reply += buffer;
    } while (i != 0);
    cout << "Reply received." << endl;

    *result = server_reply;

    closesocket(s);
    WSACleanup();

    return 0;
}

int main(int argc, char *argv[])
{
    std::string response = "";
    getHTML(&response);

    std::cout << response << std::endl;
    std::ofstream file("output.txt");
    file << response;
    file.close();

    return 0;
}

And here's the output: 这是输出:

HTTP/1.1 200 OK

Date: Mon, 03 Aug 2015 00:22:17 GMT

Server: Apache/2.2.11

Last-Modified: Mon, 13 Apr 2015 11:56:25 GMT

ETag: "99190a-1ec2-51399cdaacc40"

Accept-Ranges: bytes

Content-Length: 7874

Connection: close

Content-Type: text/html




<html>
<head>
<title>chemguide:  helping you to understand Chemistry - Main Menu</title>

<meta name="description"
content="Main menu of a site aimed to help advanced level chemistry students to understand chemistry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />


</head>

<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">

<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>


<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌè="#006600" size="6" face="Helvetica, Arial"><p align="center"><b>Helping you to understand Chemistry</b></p></font>

<font color="#000000" size="5" face="Helvetica, Arial">
<p align="center"><b>MAIN MENU</b></p>
</font>

<pre>

</pre>
<table align="center" cellpadding="10" border="1">
<tr valign="top"><td bgcolor="#cccccc"> <font color="#ff0000" face="Helvetica, Arial" size="2"><b>New!  </b></a></font><font color="#000000" face="Helvetica, Arial" size="2">stry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />


</head>

<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">

<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>


<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌÌI have just come across a really good site of short chemistry revision videos.  You can find more about it at the top of the <a href="links.html#top"></font>links</a> page.</td></tr>
</table>
<pre>

</pre>
<table align="center" cellpadding="10" border="1">


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="keywordsearch.html#top"><b>Keyword searching</b></a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌè Chemistry.</b></font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="http://www.chemguideforcie.co.uk/index.html"><b>CIE syllabus support</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for CIE (Cambridge International) A level students and teachers.</b></font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="atommze="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌÌenu.html#top">Atomic Structure and Bonding</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers basic atomic properties (electronic structures, ionisation energies, electron affinities, atomic and ionic radii, and the atomic hydrogen emission spectrum), bonding (including intermolecular bonding) and structures (ionic, molecular, giant covalent and metallic).</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌèize="2"><a href="physmenu.html#top">Physical Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers simple kinetic theory, ideal and real gases, chemical energetics, rates of reaction including catalysis, an introduction to chemical equilibria, redox equilibria, acid-base equilibria (pH, buffer solutions, indicators, etc), solubility products, and phase equilibria (including Raoult's Law and the use of various phase diagetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌÌrams).</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="analysismenu.html#top">Instrumental analysis</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Explains how you can analyse substances using machines - mass spectrometry,  infra-red spectroscopy, NMR, UV-visible absorption spectrometry and chromatography.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgmenu.html#top">Basic Organic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes help on bonding, naming and isomerism, and a discussion of organic acids and bases.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgpropsmenu.html#top">Properties of organic compounds</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers the physical and chemical properties of compounds on UK A ÌÌÌÌÌÌÌÌèlevel chemistry syllabuses, and includes a limited amount of biochemistry.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="mechmenu.html#top">Organic Reaction Mechanisms</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers all the mechanisms required by the current UK A level chemistry syllabuses.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="about.html#top">About this site</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes a contact address if you have found any difficulties with the site.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="qandclist.html#top">Questions and comments</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A selection of questions that I have been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌèts.  There are also a number of chemistry questions that I have been asked and which I haven't been able to find good answers for!</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="book.html#top">Chemistry Calculations</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A description of the author's book on calculations at UK A level chemistry standard.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="suggestions.html#top">Textbook suggestions</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Suggestions for textbooks and revision guides covering the UK AS and A level chemistry syllabuses, with links to Amazon.co.uk if you want to follow them up.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌ̘es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>

<pre>

</pre>
<hr />

<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>

</table></center>
</BODY>
</HTML>
tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ6es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>

<pre>

</pre>
<hr />

<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>

</table></center>
</BODY>
</HTML>
tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ

I'm using Visual Studio 2013. Here's my stdafx.h file: 我正在使用Visual Studio2013。这是我的stdafx.h文件:

// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//

#pragma once

#define _WINSOCK_DEPRECATED_NO_WARNINGS
//#define _CRT_SECURE_NO_WARNINGS

#include "targetver.h"

#include <stdio.h>
#include <tchar.h>



// TODO: reference additional headers your program requires here

The problem is that you treat the data you read as strings, but you seem to forget that C-style strings in C++ are terminated by the special character '\\0' . 问题是您将读取的数据视为字符串,但是您似乎忘记了C ++中的C样式字符串以特殊字符'\\0'终止。

So you need to read one character less than the buffer size, and terminate the buffer you read as a string by adding the terminator character at the end: 因此,您需要读取小于缓冲区大小的一个字符,并通过在末尾添加终止符来终止作为字符串读取的缓冲区:

if (i >= 0)
    buffer[i] = '\0';

The reason you're getting gibberish is because when you append the buffer to the string server_reply , the += operator function looks for this terminator to find the end of the string to append, if the terminator the += operator function will just continue until it finds a byte corresponding to the terminator character, which might even be beyond the limits of buffer . 之所以变得乱码是因为,当您将缓冲区追加到字符串server_reply+=运算符函数将查找此终止符以查找要​​追加的字符串的结尾,如果终止符,则+=运算符函数将继续执行直到它找到一个与终止符相对应的字节,该字节甚至可能超出buffer的限制。 Not terminating a string leads to undefined behavior . 不终止字符串会导致不确定的行为


Also, you don't check for errors when receiving, what do you think will happen if recv returns SOCKET_ERROR (which is not equal to zero)? 另外,您不检查接收时是否有错误,如果recv返回SOCKET_ERROR (不等于零),您会怎么办? You will end up with an infinite loop. 您将最终陷入无限循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM