简体   繁体   English

如何编写一种算法来找出特定单词出现在哪些行中(我正在使用std :: map)

[英]How to write an algorithm that finds out in which lines specific word appear (I am using std::map)

I am writing code which calculate how many times each word appears in text ( I DID THIS TASK), but I can't find a way to calculate in which lines these words appeared. 我正在编写代码,计算每个单词在文本中出现的次数(我有这个任务),但是我找不到一种方法来计算这些单词出现在哪些行中。

I don't know where to start. 我不知道从哪里开始。


#include "header.h"
int main()
{
     //read the input, keeping track of each word and how often we see it
     std::ifstream in("tekstas.txt"); // input file

    std::string input;
    std::map<std::string, int> counters; // store each word and an associated counter
    std::vector<char>CharVect; // vector that stores symbols i want to replace
    formuojuChar(CharVect); // pushbacking that vector with symbols
     for (unsigned int i = 0; !in.eof(); i++)
    {
        std::getline(in, input);
        std::transform(input.begin(), input.end(), input.begin(), ::tolower); // lowering letters so for example "Mom"= "mom"
        Replace(input,CharVect, ' '); // replace symbols with space
        std::stringstream read(input);
        std::string word;
        while (read >> word)
        {
            ++counters[word];
        }
     }
     std::ofstream out("isvestis.txt");
    std::cout<<"Words that appear more than once in the text: "<<std::endl;
    std::cout<<std::endl;
     for (std::map<std::string, int>::const_iterator it = counters.begin();it != counters.end(); ++it)
        {
            if((it->second)>1)
            {

                std::cout<<"'" <<it->first<<"' " <<"appears "<<it->second <<" times in lines: " ;
                /*
                 ANY IDEAS ?
                 */
                std::cout<<std::endl;

            }
        }
        return 0;

}


I expect output to show me in which .txt file lines that word appears. 我希望输出显示该单词出现在哪个.txt文件行中。 TY TY

This looks like a learning exercise that you want to do on your own, and I have a policy of not writing code for those. 这看起来像是您想自己做的学习练习,并且我有一项政策是不为那些人编写代码。

However, one thing you could do is count the number of newlines you've encountered (which tells you the line you're on) and, whenever you see the text you're searching for, insert the current line number into a std::set<unsigned> or std::vector<unsigned> . 但是,您可以做的一件事就是计算遇到的换行数(告诉您正在换行),并且每当您看到要搜索的文本时,就将当前行号插入到std::set<unsigned>std::vector<unsigned>

You would want to do this in a single loop, perhaps reading in a line at a time. 您可能希望在一个循环中完成此操作,也许一次要一行读取。 Whever you encounter the search term, update both the word counter and the set of line numbers. 遇到搜索词时,请同时更新单词计数器和行号集。

The major issue with your approach is that you're using a second loop to gather the information on the words, and in doing so, you've lost all the information about which line(s) the words are on. 方法的主要问题是,您正在使用第二个循环来收集有关单词的信息,这样做时,您丢失了有关单词所在行的所有信息。

Instead of trying to figure out what line you're on in the second loop, you have all the information necessary on the current line in the first loop. 您不必试图弄清楚您在第二个循环中处于哪一行,而是在第一个循环中具有当前行所必需的所有信息。 All you need is a variable that keeps track of each line. 您只需要一个变量来跟踪每一行。 You are using (incorrectly I may add), std::getline -- each time you call that function, you are going to the next line, thus you know implicitly what line you're on in the first loop. 您正在使用(我可能会错误地添加) std::getline -每次调用该函数时,您都将转到下一行,因此您隐式知道了第一个循环中的哪一行。


First, you need to fix your read loop so that it reads lines correctly from a file: 首先,您需要修复读取循环,以便它可以从文件中正确读取行:

std::string line;
while (std::getline(in, line))
{
//...
}

Second, inside the while loop, you can determine all the information you need for the word, word count, and line(s) where the word is found. 其次,在while循环内,您可以确定单词,单词计数和找到单词的行所需的所有信息。 You don't need two loops to do this. 您不需要两个循环即可执行此操作。

Instead of a std::map<std::string, int> , which only knows about the word count, you could create a map that holds all the information -- the word count and the line(s) the word is found on. 代替只知道单词计数的std::map<std::string, int> ,您可以创建一个包含所有信息的映射-单词计数找到该单词的行。 Here is a map type that can hold this information: 这是可以保存此信息的地图类型:

std::map<std::string, std::pair<int, std::set<int>>>

The "second" of the map holds the information on the count, and a std::set that will hold all of the line numbers where the word is found. 映射的“第二”保存有关计数的信息,以及一个std::set ,它将保存找到单词的所有行号。 The reason for the std::set is to guarantee that duplicate line numbers will not be stored. 使用std::set的原因是为了确保不会存储重复的行号。

Putting this all together, here is a sample program using this type: 放在一起,这是使用此类型的示例程序:

#include <map>
#include <set>
#include <string>
#include <sstream>
#include <iostream>

// pair and map type
using WordInfo = std::pair<int, std::set<int>>;
using WordMap = std::map<std::string, WordInfo>;

int main()
{
    // our map
    WordMap wm;
    std::string line;

    // the line count
    int line_number = 1;
    while (std::getline(std::cin, line))
    {
        // line parser
        std::istringstream strm(line);
        std::string word;
        while ( strm >> word)
        {
            // we call map::insert, not `[ ]` to insert into a map
            auto pr = wm.insert({word, {0,std::set<int>()}});

            // the return value of map::insert gives us a pair, where the first is 
            // an iterator to the item in the map
            auto& mapIter = pr.first;

            // increment the word count   
            ++(mapIter->second.first);

            // insert the line number into the set
            mapIter->second.second.insert(line_number);
        }

        // increment the line counter
        ++line_number;
    }

    // output results
    for (auto& m : wm )
    {
        std::cout << "The word  \"" << m.first << "\" appears " << m.second.first << " times on the following lines:\n";
        for ( auto& m2 : m.second.second)
            std::cout << m2 << " ";
        std::cout << "\n\n";
    }
}

So what was done here? 那么在这里做了什么?

1) The line where each word is on is known in the read loop. 1)每个字所在的行在读取循环中是已知的。 All that is done is to increment the line count for each line that is read in. 所有要做的就是增加读入的每一行的行数。

2) We use std::map::insert to insert an entry into the map, and not std::map::operator[ ] . 2)我们使用std::map::insert将条目插入地图,而不是 std::map::operator[ ] The reason is that map::insert will not insert an entry if the entry already exists, it will insert a brand new entry if the entry doesn't exist, and regardless of which is done, std::map::insert returns an iterator to the item in the map. 原因是map::insert如果该条目已经存在,则不会插入该条目;如果该条目不存在,则它将插入一个全新的条目,并且无论执行了什么操作, std::map::insert返回一个迭代器到地图中的项目。

We need the iterator returned to us for later processing. 我们需要返回给我们的迭代器以进行后续处理。 In the subsequent lines, we just increment the count and update the std::set . 在接下来的几行中,我们只是增加计数并更新std::set

Here is a live example . 这是一个实时示例


Note: I have no idea what all of the replacement you're doing in your original program, so I skipped over all of that and concentrated solely on the task of determining the words and the line(s) the words are situated. 注意:我不知道您在原始程序中要执行的所有替换操作,因此我跳过了所有这些操作,仅专注于确定单词和单词所在行的任务。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我使用的是哪个 std::string 构造函数? - Which std::string constructor am I using? 我如何找出std :: map方法可以抛出哪些异常? - How do I find out which exceptions std::map methods can throw? 当std :: map :: insert找到元素时,它仍会构造该对象的实例。 我该如何阻止呢? - when std::map::insert finds the element, it still constructs an instance of the object. How can I stop this? 如果我使用嵌套的for循环进行迭代,如何在De Casteljau算法中指定特定的控制点? - How do I specify the specific the control points in a De Casteljau algorithm, if I am using nested for loops to iterate? 如何创建一个 function 在文本中找到匹配的单词,包括跳过 - How to create a function which finds a match of a word in a text including with skippings 如何插入到 std::multimap 中的 std::map? - How to insert to std::map which is in std::multimap? C++ 如何优化这个算法? (标准::映射) - C++ How to optimize this algorithm ? (std::map) 在为我的程序抛出一个 &#39;std::out_of_range&#39; 实例后调用 Terminate 来查找和替换字符串中的单词 - Terminate called after throwing an instance of 'std::out_of_range' for my program which finds and replaces words in a string 如何找出我的 C++ 编译器用于 std::hash 的特定算法? - How to find out the specific algorithm my C++ compilers use for std::hash? 如何转换hash_map <string, string> 到hash_map <wstring, wstring> 使用std :: transform算法? - How to convert hash_map<string, string> to hash_map<wstring, wstring> using the std::transform algorithm?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM