简体   繁体   English

需要C ++使用地图的帮助来跟踪INPUT文件中的单词

[英]need help with C++ using maps to keep track of words in a INPUT file

Let say i have a text file with 假设我有一个文本文件

today is today but
tomorrow is today tomorrow

then using maps how can i keep track of the words that are repeated? 然后使用地图如何跟踪重复的单词? and on which line it repeats? 在哪一行重复? so far i have each string in the file read in as a temp and it is stored in the following way: 到目前为止,我已将文件中的每个字符串作为临时文件读取,并以以下方式存储:

    map<string,int> storage;

    int count = 1 // for the first line of the file

    if(infile.is_open()){
     while( !infile.eof() ){ 
      getline(in, line);
      istringstream my_string(line);
      while(my_string.good()){
         string temp;
         my_string >> temp;

    storage[temp] = count
    }
    count++;// so that every string read in the next line will be recorded as that line.
}
}
   map<string,int>::iterator m;
   for(int m = storage.begin(); m!= storage.end(); m++){
      out<<m->first<<": "<<"line "<<m->second<<endl;
}

right now the output is just 现在的输出是

but: line 1
is: line 2
today: line 2
tomorrow: line 2

But instead.. it should print out(no repeating strings): 但相反..它应该打印出来(不重复字符串):

today : line 1 occurred 2 times, line 2 occurred 1 time.
is: line 1 occurred 1 time, line 2 occurred 1 time.
but: line 1 occurred 1 time.
tomorrow: line 2 occurred 2 times.

Note: the order of the string does not matter. 注意:字符串的顺序无关紧要。

Any help would be appreciated. 任何帮助,将不胜感激。 Thanks. 谢谢。

map stores a (key, value) pair with a unique key. map存储具有唯一键的(键,值)对。 Meaning that if you assign to the same key more than once, only the last value that you assigned will be stored. 这意味着如果您多次分配给同一键,则只会存储您分配的最后一个值。

Sounds like what you want to do is instead of storing the line as the value, you want to store another map of lines->occurances. 听起来您想要做的就是要存储另一条线图->次数,而不是将线作为值存储。

So you could make your map like this: 因此,您可以像这样制作地图:

typedef int LineNumber;
typedef int WordHits;
typedef map< LineNumber, WordHits> LineHitsMap;
typedef map< string, LineHitsMap > WordHitsMap;
WordHitsMap storage;

Then to insert: 然后插入:

WordHitsMap::iterator wordIt = storage.find(temp);
if(wordIt != storage.end())
{
    LineHitsMap::iterator lineIt = (*wordIt).second.find(count);
    if(lineIt != (*wordIt).second.end())
    {
        (*lineIt).second++;
    }
    else
    {
        (*wordIt).second[count] = 1;
    }
}
else
{
    LineHitsMap lineHitsMap;
    lineHitsMap[count] = 1;
    storage[temp] = lineHitsMap;
}

you're trying to get 2 items of information out of the collection, when you only store 1 item of information in there. 当您仅在其中存储1条信息时,您将尝试从集合中获取2条信息。

The easiest way to extend your current implementation is to store a struct instead of an int. 扩展当前实现的最简单方法是存储结构而不是int。

So instead of: 所以代替:

storage[temp] = count

you'd do: 你会做:

storage[temp].linenumber = count;
storage[temp].wordcount++;

where the map is defined: 定义地图的位置:

struct worddata { int linenumber; int wordcount; };
std::map<string, worddata> storage;

print the results using: 使用以下命令打印结果:

out << m->first << ": " << "line " << m->second.linenumber << " count: " << m->second.wordcount << endl;

edit: use a typedef for the definitions, eg: 编辑:使用typedef作为定义,例如:

typedef MYMAP std::map<std::string, struct worddata>;
MYMAP storage;

then MYMAP::iterator iter; 然后MYMAP::iterator iter;

Your storage data type is insufficient to store all the information you want to report. 您的存储数据类型不足以存储您要报告的所有信息。 You could get there by using a vector for count storage but you'd have to do a lot of book-keeping to make sure you actually insert a 0 when a word is not encountered and create the vector with the right size when a new word is encountered. 您可以通过使用向量来存储计数来达到此目的,但是您必须做大量记账工作,以确保在未遇到一个单词时实际上插入一个0,而在一个新单词时创建大小正确的向量遇到。 Not a trivial task. 这不是一件微不足道的任务。

You could switch your count part to a map of numbers, first being line and second being count... That would reduce the complexity of your code but wouldn't exactly be the most efficient method. 您可以将计数部分切换为数字映射,首先是行数,其次是计数……这将降低代码的复杂性,但并不是最有效的方法。

At any rate, you can't do what you need to do with just a std::map 无论如何,仅凭std :: map便无法完成所需的工作

Edit: just thought of an alternative version that would be easier to generate but harder to report with: std::vector< std::map<std::string, unsigned int> >. 编辑:只是想到了一个更容易生成但更难以报告的替代版本:std :: vector <std :: map <std :: string,unsigned int>>。 For each new line in a file you'd generate a new map<string,int> and push it onto the vector. 对于文件中的每一行,您将生成一个新的map <string,int>并将其推到向量上。 You could create a helper type set<string> to contain all the words that appear in a file to use in your reporting. 您可以创建一个帮助程序类型set <string>来包含文件中出现的所有单词,以供您在报告中使用。

That's probably how I'd do it anyway except I'd encapsulate all that crap in a class so that I'd just do something like: 无论如何我可能都会这样做,除了将所有这些废话封装在一个类中以便我会做类似的事情:

my_counter.word_appearance(word,line_no);

Apart from anything else, your loops are all wrong. 除了别的什么,循环都是错误的。 You should never loop on the eof or good flags, but on the success of the read operation. 永远不要在eof或good标志上循环,而要在读取操作成功的基础上循环。 You want something like: 您想要类似的东西:

while( getline(in, line) ){ 
      istringstream my_string(line);
      string temp;
      while(my_string >> temp ){
           // do something with temp
      }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM