简体   繁体   English

c++ - 在文本文件中逐行查找单词的频率

[英]Finding a word's frequency in a text file line by line c++

I need to read a file then ask to user for a word, after that i need to display occurrence of that word line by line.我需要读取一个文件然后向用户询问一个单词,然后我需要逐行显示该单词的出现。 Also I need to check this with char arrays.我还需要用字符数组检查这个。 You can check my Output example;您可以查看我的输出示例;

Line 2: 1 occurrence(s)
line 4: 2 occurrence(s)
Line 7: 1 occurrence(s)

As you can see I divided line lenght by searchString lenght, this is the maximum time of searchString's possiblity of occurrence.如您所见,我将行长度除以 searchString 长度,这是 searchString 出现的最长时间。 So, I need to display occurence(s) but my code shows this division as occurrence.因此,我需要显示出现次数,但我的代码将此划分显示为出现次数。 Can you help me about this?你能帮我解决这个问题吗?

#include <iostream>
#include <string>
#include <fstream>
#include <istream>

using namespace std;
int number_of_lines = 1;

void numberoflines();

unsigned int GetFileLength(std::string FileName)
{
    std::ifstream InFile(FileName.c_str());
    unsigned int FileLength = 0;
    while (InFile.get() != EOF) FileLength++;
    InFile.close();
    cout<<"Numbers of character in your file : "<<FileLength<<endl;
    return FileLength;
}


int main()
{
    string searchString, fileName, line;
    int a;
    string *b;
    char *c,*d;
    int wordCount = 0, count = 0,count1=0;
    cout << "Enter file name : " << endl;
    cin >> fileName;
    GetFileLength(fileName);
    cout << "Enter a word for searching procces : " << endl;
    cin >> searchString;



    ifstream in (fileName.c_str(), ios::in);
    d= new char[searchString.length()+1];

    strcpy(d,searchString.c_str());

    a=GetFileLength(fileName);
    b= new string [a];


    if(in.is_open()){
        while(!in.eof()){
            getline(in,line);
            c= new char[line.length()+1];
            count++;


            strcpy(c,line.c_str());


            count1=0;
            for (int i = 0; i < line.length()/searchString.length(); i++)
            {

                char *output = NULL;
                output = strstr (c,d);
                if(output) {
                    count1++;
                }
                else count1--;
            }
            if(count1>0){cout<<"Line "<<number_of_lines<<": "<<count1<<" occurrence(s) "<<endl;}
            number_of_lines++;
            if (count==10)
            {
                break;
            }
        }

        numberoflines();
    }


    return 0;
}

void numberoflines(){
    number_of_lines--;
    cout<<"number of lines in text file: " << number_of_lines << endl;
}

Output:输出: 查看输出

This loop:这个循环:

        for (int i = 0; i < line.length()/searchString.length(); i++)
        {
            char *output = NULL;
            output = strstr (c,d);
            if(output) {
                count1++;
            }
            else count1--;
        }

is not counting all the matches of the string in the line, because c and d are the same each time you call strstr() .不计算行中字符串的所有匹配项,因为每次调用strstr()cd都是相同的。 When you repeat the search, you have to start from somewhere after the previous match.当您重复搜索时,您必须从上一次匹配之后的某个地方开始。

There's also no reason to subtract from count1 when you don't find a match.当您找不到匹配项时,也没有理由从count1减去。 You should just exit the loop when that happens.发生这种情况时,您应该退出循环。 And there's little point in using a for loop, because you're not doing anything with i ;使用for循环没什么意义,因为你没有对i做任何事情; just use a while loop.只需使用一个while循环。

        char *start = c;
        size_t searchlen = searchString.length();
        while (true)
        {
            char *output = strstr (start,d);
            if(output) {
                count1++;
                start = output + searchlen;
            } else {
                break;
            }
        }

You don't need to read in the entire file into an array or std::string .您不需要将整个文件读入数组或std::string I recommend you keep this program simple before optimizing.我建议你在优化之前保持这个程序简单。

As noted in your question, you are required to use character arrays and to read line by line.如您的问题所述,您需要使用字符数组并逐行读取。

Look up theistream::getline function as it will be very useful.查找istream::getline函数,因为它非常有用。

Let's declare a maximum line length of 1024.让我们声明最大行长度为 1024。

Here's the reading file part:这是阅读文件部分:

#define MAX_LINE_LENGTH (1024)
char text_buffer[MAX_LINE_LENGTH]; // Look, no "new" operator. :-)
//...
while (my_text_file.getline(text_buffer, MAX_LINE_LENGTH, '\n'))
{
 //... TBD
}

The above code fragment reads a line of text into the variable text_buffer .上面的代码片段将一行文本读入变量text_buffer

Because you are using character arrays, please read through the "str" functions in your favorite texts, such as strstr ;因为您使用的是字符数组,请通读您喜欢的文本中的“str”函数,例如strstr or you may have to write your own.或者您可能必须自己编写。

The next step would be to extract a "word" from the text line.下一步是从文本行中提取一个“单词”。

In order to extract a word, we need to know where it starts and where it ends.为了提取一个词,我们需要知道它从哪里开始,到哪里结束。 So, the text line will need to be searched.因此,需要搜索文本行。 See the isalpha funciton as it will be useful.请参阅isalpha函数,因为它很有用。

Here's a loop for finding the beginning and ending of a word:这是一个用于查找单词开头和结尾的循环:

unsigned int word_start_position = 0; // start at beginning of the line.
unsigned int word_end_position = 0;
const unsigned int length = strlen(text_buffer); // Calculate only once.
while (word_start_position < length)
{
  // Find the start of a word.
  while (!isalpha(text_buffer[word_start_position]))
  {
    ++word_start_position;
  }

  // Find end of the word.
  word_end_position = word_start_position;
  while (isalpha(text_buffer[word_end_position]))
  {
    ++word_end_position;
  }
}

There some logic issues remaining in the above code fragments for the OP to resolve.上述代码片段中存在一些逻辑问题,供 OP 解决。

The next part would be to add code that uses the start and end position of the word to copy the characters in the word to another variable.下一部分是添加使用单词的开始和结束位置将单词中的字符复制到另一个变量的代码。 This variable would then be used in a map or associative array or dictionary which contains the number of occurrences.然后,此变量将用于包含出现次数的映射关联数组字典

In other terms, search the container for the word.换句话说,在容器中搜索该词。 If the word exists, increment the associated occurrence variable.如果该词存在,则增加关联的出现变量。 If it doesn't exist, add the word to the container with an occurrence of 1.如果它不存在,则将出现次数为 1 的单词添加到容器中。

Good Luck!祝你好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM