简体   繁体   English

在C ++中计算文本文件中的相同字符串/单词

[英]Counting same string/word in a text file in C++

I'm trying to count same string/words from a text file in C++. 我试图从C ++中的文本文件计算相同的字符串/单词。

This is my text file
one two three two
test testing 123
1 2 3

This is my main program 这是我的主程序

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, const char** argv)
{
    int counter = 0;
    int ncounter = 0;
    string str;
    ifstream input(argv[1]);

    while (getline(input, str)) 
    {
        if(str.find("two") != string::npos){counter++;}
        if(str.find('\n') != string::npos){ncounter++;}

        cout << str << endl; //To show the content of the file
    }

    cout << endl;
    cout << "String Counter: " << counter << endl;
    cout << "'\\n' Counter: " << ncounter << endl;

    return 0;
}

I'm using the .find() function to find the string. 我正在使用.find()函数查找字符串。 When I insert an non-existant word, it doesn't count. 当我插入一个不存在的单词时,它不算在内。 When I insert the word "two", it counts, but only once. 当我插入“两个”一词时,它会计数,但只有一次。

How come it didn't count 2 times? 怎么没算两次呢?

And for the carriage return (or return line; \\n), it can't count any. 对于回车符(或回车行; \\ n),它不能计数。 Why is that? 这是为什么?

Because the two twos are on the same line and you are searching the line only for one substring. 因为这两个在同一行上,并且您仅在该行中搜索一个子字符串。
You can't find the '\\n' because the getline function reads the line up to and without the '\\n'. 您找不到'\\ n',因为getline函数读取的行一直到不包含'\\ n'。

Why not use a std::multiset to store the words ? 为什么不使用std::multiset存储单词呢? It would do the counting for you, and reading the file into it can be done in one line: 它将为您进行计数 ,并且可以在一行中完成将文件读入其中的操作:

#include <iostream>
#include <fstream>
#include <string>
#include <set>
#include <iterator>

int main(int argc, const char** argv)
{
    // Open the file
    std::ifstream input(argv[1]);

    // Read all the words into a set
    std::multiset<std::string> wordsList = 
        std::multiset<std::string>( std::istream_iterator<std::string>(input),
                                    std::istream_iterator<std::string>());

    // Iterate over every word
    for(auto word = wordsList.begin(); word != wordsList.end(); word=wordsList.upper_bound(*word))
        std::cout << *word << ": " << wordsList.count(*word) << std::endl;

    // Done
    system("pause");
    return 0;
}

Note the last for part - word=wordsList.upper_bound(*word) . 注意最后for部分word=wordsList.upper_bound(*word) Technically you can switch it to simply word++ (then actually it would be better to shorten it to simply for(auto word: wordList ). It ensures each value from the set will only be output once. 从技术上讲,您可以将其切换为简单的word++ (然后将其简化for(auto word: wordList )会更好。它可以确保集合中的每个值仅输出一次。

It will also list the words themselves without you needing to do it like now inside your current while loop. 它还会列出单词本身,而无需像现在在当前while循环中那样进行操作。

Your best bet is going to be to read each line, then tokenize along the white space so you can examine each word individually. 最好的选择是阅读每一行,然后沿空白标记,以便您可以单独检查每个单词。

I suspect we're talking about a homework assignment here, so my best answer is to direct you to the c++ reference for std::strtok: http://en.cppreference.com/w/cpp/string/byte/strtok 我怀疑我们在这里谈论的是一项家庭作业,所以我最好的答案是将您引导至std :: strtok的C ++参考: http : //en.cppreference.com/w/cpp/string/byte/strtok

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM