简体   繁体   中英

c++ dealing with multiple strings in recv function for irc bot

I am trying to write a simple irc bot in c++ (I have previously done this in python but I am struggling with dealing with strings using c++ especially unicode strings.)

So far I can connect to the IRC server and read the buffer, BUT the buffer can contain multiple lines, and it also contains a lot of null data. There is also a possibility of having wide characters or a single message line overflowing the buffer.

I want to read the buffer then process each string line by line for each '\\n' terminated line.

#include "stdafx.h"
#include <stdio.h>
#include <string>
#include <iostream>

#ifdef _WIN32
#include <winsock2.h>
#include <ws2tcpip.h>
#pragma comment(lib,"ws2_32.lib")
#else
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#endif

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

const char \
*pass = "pass",
*bot_owner = "name",
*nick = "name",
*serv = "irc.twitch.tv",
*chan = "#name";

using namespace std;


int main()
{

            int ret;
            char buf[512] = "";
#ifdef _WIN32
            SOCKET sock;
            struct WSAData* wd = (struct WSAData*)malloc(sizeof(struct WSAData));
            ret = WSAStartup(MAKEWORD(2, 0), wd);
            free(wd);
            if (ret) { puts("Error loading Windows Socket API"); return 1; }
#else
            int sock;
#endif
            struct addrinfo hints, *ai;
            memset(&hints, 0, sizeof(struct addrinfo));
            hints.ai_family = AF_UNSPEC;
            hints.ai_socktype = SOCK_STREAM;
            hints.ai_protocol = IPPROTO_TCP;
            if (ret = getaddrinfo(serv, "6667", &hints, &ai)) {
                //puts(gai_strerror(ret)); // this doesn't compile
                return 1;
            }
            sock = socket(ai->ai_family, ai->ai_socktype, ai->ai_protocol);
            if (ret = connect(sock, ai->ai_addr, ai->ai_addrlen)) {
                //puts(gai_strerror(ret)); // this doens't compile
                return 1;
            }
            freeaddrinfo(ai);
            sprintf_s(buf, "PASS %s\r\n", pass);
            send(sock, buf, strlen(buf), 0);
            sprintf_s(buf, "USER %s\r\n", nick);
            send(sock, buf, strlen(buf), 0);
            sprintf_s(buf, "NICK %s\r\n", nick);
            send(sock, buf, strlen(buf), 0);
            int bytesRecieved;
            while ((bytesRecieved = recv(sock, buf, 512, 0)) > 0) {

                std:cout << "\nbytesRecieved : " << bytesRecieved << "\n";
                std::cout << "DATA : " << buf;

                if (!strncmp(buf, "PING ", 5)) {
                    const char * pong = "PONG ";
                    send(sock, pong, strlen(pong), 0);
                }
                if (buf[0] != ':') continue;
                if (!strncmp(strchr(buf, ' ') + 1, "001", 3)) {
                    sprintf_s(buf, "JOIN %s\r\n", chan); 
                    send(sock, buf, strlen(buf), 0);
                }
            }
#ifdef _WIN32
            closesocket(sock);
            WSACleanup();
#else
            close(sock);
#endif

    return 0;
}

Whats the best way to split the recv buf into several strings if it contains many separated by '/n' ? and iterate over them? How can I deal with a potential string being split over the end of the buffer and beginning of the next one? And also how do I deal with utf-8 characters? Because the twitch irc bot accepts many different language characters?

Many thanks, my C++ skills are quite basic and I am mostly trying to convert this bot from a simple one I wrote in python which has lots of nice easy ways of dealing with strings. If you can explain things as if you are dealing with an idiot, I'd appreciate that.

---- edit ----

I think I need to do something like :

        for (int i = 0; i > bytesRecieved; i++) {

            string stringbuilder;
            stringbuilder.push_back(buf[i]);

        }

iterating through the char buffer and building up separate strings by reading them until the '/n' char then doing the next one and putting those into into a vector(?) of strings? Then interating over that vector, I don't know how to do this in c though any ideas? I've tried the boost library below but this always ends up creating a string at the end with a lot of nonsense chars in.

I would check out boost::tokenizer for splitting the string into mulitple substrings to iterate over based on a delimiter. You'll need to store the string in a std::string to pass it to Tokenizer. Example:

using sep = boost::char_separator<char>;
using tokenizer = boost::tokenizer<sep>;
constexpr auto separators = "\n";
const auto socket_string = std::string(/*values from socket go here*/);
const auto tokens = tokenizer(socket_string, sep(separators));
for(const token : tokens)
/* 
 * this loop will iterate over all the lines received from the socket,
 * one line at a time
 */
{
    /* token represents a single line of input */
}

When it comes to strings being split over multiple buffers... you have to have some way to detect that. Where I work when we send messages over a socket, we preface the messages with an integer representing the number of bytes in the message, that way we can check the size of the received string to know if we're done or not. Without an API like that you'll have to decide on some way to parse the strings and decide if you've received everything yet. Or just leave it dumb and simple and parse each buffer as a new string. In your case, perhaps if the string you read off the buffer did not end in '\\n' , then it is not finished yet? That's probably what I would check for, but I don't know all your constraints.

How you deal with UTF-8 characters will depend on your platform. On *nix boxes I believe that std::string is UTF-8 encoded by default. On Windows you might need to use std::wstring .

Also, I'd suggest reading up on idiomatic C++ . Your code is about 90% Pure C.

In the end I solved the issue by iterating over the buf char array and pushing each char onto the end of a new string. When I encounter a '/n' char I add that new string into a vector and reset the string with the clear() function.

This continues until for the length of the char array until the index of returned by recv which indicates valid bytes.

The vector is then iterated over in a for loop.

        std::vector <string> vs;
        string newString;
        for (int i = 0; i < bytesRecieved; i++) {
            newString.push_back(buf[i]);
            if (buf[i] == '\n') {
                vs.push_back(newString);
                newString.clear();
            }

        }

        for (const auto &item_vs : vs) {
            // This is where the recv buffer lines are iterated over
            cout << "Value : ";
            cout << item_vs;
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM