简体   繁体   English

在C ++中拆分字符串(使用cin)

[英]Splitting a String in C++ (using cin)

I'm doing THIS UVa problem, which takes in the following input: 我在做这个 UVa问题,它包含以下输入:

This is fun-
ny!  Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
This is fun-
ny!  Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#

and produces this output: 并产生以下输出:

1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1

1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1

In the input, # divides the cases. 在输入中,#分隔大小写。 I'm supposed to get the length of each word and count the frequency of each different length (as you see in the output, a word of length 1 occurs once, length 2 occurs three times, 3 occurs twice, and so on). 我应该获得每个单词的长度并计算每个不同长度的频率(如您在输出中看到的,一个长度为1的单词出现一次,长度2出现3次,3出现两次,依此类推)。

My problem is this: When reading in cin, before.Crazy is counted as one word, since there is no space dividing them. 我的问题是:在cin中阅读之前, before.Crazy被视为一个单词,因为没有空格将它们分开。 It should then be as simple as splitting the string on certain punctuation ( {".",",","!","?"} for example)...but C++ seems to have no simple way to split the string. 然后,它应该像在某些标点符号上分割字符串一样简单(例如{".",",","!","?"} )...但是C ++似乎没有简单的方法来分割字符串。

So, my question: How can I split the string and send in each returned string to my function that handles the rest of the problem? 所以,我的问题是:如何分割字符串并将每个返回的字符串发送给处理其余问题的函数?

Here's my code: 这是我的代码:

int main()
{
    string input="";
    while(cin.peek()!=-1)
    {   
        while(cin >> input && input!="#")
        {
            lengthFrequency(input);
            cout << input << " " << input.length() << endl;
        }

        if(cin.peek()!=-1) cout << endl;
        lengthFrequencies.clear();
    }
    return 0;
}

lengthFrequency is a map<int,int> . lengthFrequency是一个map<int,int>

You can redefine what a stream considers to be a whitespace character using a std::locale with a custom std::ctype<char> facet. 您可以使用带有自定义std::ctype<char>构面的std::locale重新定义流认为是空格字符的内容。 Here is corresponding code which doesn't quite do the assignment but demonstrates how to use the facet: 这是相应的代码,它不完全执行分配,但演示了如何使用构面:

#include <algorithm>
#include <iostream>
#include <locale>
#include <string>

struct ctype
    : std::ctype<char>
{
    typedef std::ctype<char> base;
    static base::mask const* make_table(char const* spaces,
                                        base::mask* table)
    {
        base::mask const* classic(base::classic_table());
        std::copy(classic, classic + base::table_size, table);
        for (; *spaces; ++spaces) {
            table[int(*spaces)] |= base::space;
        }
        return table;
    }
    ctype(char const* spaces)
        : base(make_table(spaces, table))
    {
    }
    base::mask table[base::table_size];
};

int main()
{
    std::cin.imbue(std::locale(std::locale(), new ctype(".,!?")));
    for (std::string s; std::cin >> s; ) {
        std::cout << "s='" << s << "'\n";
    }
}

Before counting the frequencies, you could parse the input string and replace all the {".",",","!","?"} characters with spaces (or whatever separation character you want to use). 在计算频率之前,您可以解析输入字符串,并用空格(或您要使用的任何分隔字符)替换所有{".",",","!","?"}字符。 Then your existing code should work. 然后,您现有的代码应该可以工作。

You may want to handle some characters differently. 您可能需要不同地处理某些字符。 For example, in the case of before.Crazy you would replace the '.' 例如,对于before.Crazy您将替换为'.' with a space, but for something like 'ny! ' 带有空格,但类似于'ny! ' 'ny! ' you would remove the '!' 'ny! '您将删除'!' altogether because it is already followed by a space. 完全是因为它后面已经有一个空格。

How about this (using the STL, comparators and functors)? 怎么样(使用STL,比较器和函子)?

NOTE: All assumptions and explanations are in the source code itself. 注意:所有假设和解释都在源代码本身中。

#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>
#include <algorithm>
#include <cctype>
#include <utility>
#include <string.h>

bool compare (const std::pair<int, int>& l, const std::pair<int, int>& r) {
    return l.first < r.first;
}

//functor/unary predicate:
struct CompareFirst {
    CompareFirst(int val) : val_(val) {}
    bool operator()(const std::pair<int, int>& p) const {
        return (val_ == p.first);
    }
private:
    int val_;
};

int main() {
    char delims[] = ".,!?";
    char noise[] ="-'";

    //I'm assuming you've read the text from some file, and that information has been stored in a string. Or, the information is a string (like below):
    std::string input = "This is fun-\nny,  Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\nThis is fun-\nny!  Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\n";

    std::istringstream iss(input);
    std::string temp;

    //first split the string by #
    while(std::getline(iss, temp, '#')) {

        //find all the occurences of the hypens as it crosses lines, and remove the newline:
        std::string::size_type begin = 0;

        while(std::string::npos != (begin = temp.find('-', begin))) {
            //look at the character in front of the current hypen and erase it if it's a newline, if it is - remove it
            if (temp[begin+1] == '\n') {
                temp.erase(begin+1, 1);
            }
            ++begin;
        }

        //now, erase all the `noise` characters ("'-") as these count as these punctuation count as zero
        for (int i = 0; i < strlen(noise); ++i) {
            //this replaces all the hyphens and apostrophes with nothing
            temp.erase(std::remove(temp.begin(), temp.end(), noise[i]), temp.end());//since hyphens occur across two lines, you need to erase newlines
        }//at this point, everything is dandy for complete substitution

        //now try to remove any other delim chracters by replacing them with spaces
        for (int i = 0; i < strlen(delims); ++i) {
            std::replace(temp.begin(), temp.end(), delims[i], ' ');
        }

        std::vector<std::pair<int, int> > occurences;

        //initialize another input stringstream to make use of the whitespace
        std::istringstream ss(temp);

        //now use the whitespace to tokenize
        while (ss >> temp) {

            //try to find the token's size in the occurences
            std::vector<std::pair<int, int> >::iterator it = std::find_if(occurences.begin(), occurences.end(), CompareFirst(temp.size()));

            //if found, increment count by 1
            if (it != occurences.end()) {
                it->second += 1;//increment the count
            }
            //this is the first time it has been created. Store value, and a count of 1
            else {
                occurences.push_back(std::make_pair<int, int>(temp.size(), 1));
            }
        }

        //now sort and output:
        std::stable_sort(occurences.begin(), occurences.end(), compare);

        for (int i = 0; i < occurences.size(); ++i) {
            std::cout << occurences[i].first << " " << occurences[i].second << "\n";
        }
        std::cout << "\n";
    }

    return 0;
}

91 lines, and all vanilla C++98. 91行,以及所有香草C ++ 98。

A rough outline of what I did is: 我所做的大致概述是:

  1. Since hyphens occur across two lines , find all hyphens and remove any newlines that follow them. 由于连字符出现在两行之间 ,因此请查找所有连字符并删除其后的所有换行符。
  2. There are characters that don't add to the length of a word such as the legitimate hypenated words and the apostrophe. 有些字符不会增加单词的长度,例如合法的连词和撇号。 Find these and erase them as it makes tokenizing easier. 找到并清除它们,因为这使标记过程变得更加容易。
  3. All the other remaining delimiters can now be found and replaced with whitespace. 现在可以找到所有其他剩余的定界符,并用空格替换。 Why? 为什么? Because we can use the whitespace to our advantage by using streams (whose default action is to skip whitespace). 因为我们可以通过使用流(其默认操作是跳过空白)来利用空白来发挥我们的优势。
  4. Create a stream and tokenize the text via whitespace as per the previous. 创建一个流,并按照前面的空白通过空格标记文本
  5. Store the lengths of the tokens and their occurrences . 存储令牌的长​​度和它们的出现
  6. Sort the lengths of the tokens, and then output the token length and corresponding occurrences. 排序令牌的长度,然后输出令牌的长度和相应的出现次数。

REFERENCES: 参考文献:

https://stackoverflow.com/a/5815875/866930 https://stackoverflow.com/a/5815875/866930

https://stackoverflow.com/a/12008126/866930 https://stackoverflow.com/a/12008126/866930

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM