简体   繁体   中英

Finding a word's frequency in a text file line by line c++

I need to read a file then ask to user for a word, after that i need to display occurrence of that word line by line. Also I need to check this with char arrays. You can check my Output example;

Line 2: 1 occurrence(s)
line 4: 2 occurrence(s)
Line 7: 1 occurrence(s)

As you can see I divided line lenght by searchString lenght, this is the maximum time of searchString's possiblity of occurrence. So, I need to display occurence(s) but my code shows this division as occurrence. Can you help me about this?

#include <iostream>
#include <string>
#include <fstream>
#include <istream>

using namespace std;
int number_of_lines = 1;

void numberoflines();

unsigned int GetFileLength(std::string FileName)
{
    std::ifstream InFile(FileName.c_str());
    unsigned int FileLength = 0;
    while (InFile.get() != EOF) FileLength++;
    InFile.close();
    cout<<"Numbers of character in your file : "<<FileLength<<endl;
    return FileLength;
}


int main()
{
    string searchString, fileName, line;
    int a;
    string *b;
    char *c,*d;
    int wordCount = 0, count = 0,count1=0;
    cout << "Enter file name : " << endl;
    cin >> fileName;
    GetFileLength(fileName);
    cout << "Enter a word for searching procces : " << endl;
    cin >> searchString;



    ifstream in (fileName.c_str(), ios::in);
    d= new char[searchString.length()+1];

    strcpy(d,searchString.c_str());

    a=GetFileLength(fileName);
    b= new string [a];


    if(in.is_open()){
        while(!in.eof()){
            getline(in,line);
            c= new char[line.length()+1];
            count++;


            strcpy(c,line.c_str());


            count1=0;
            for (int i = 0; i < line.length()/searchString.length(); i++)
            {

                char *output = NULL;
                output = strstr (c,d);
                if(output) {
                    count1++;
                }
                else count1--;
            }
            if(count1>0){cout<<"Line "<<number_of_lines<<": "<<count1<<" occurrence(s) "<<endl;}
            number_of_lines++;
            if (count==10)
            {
                break;
            }
        }

        numberoflines();
    }


    return 0;
}

void numberoflines(){
    number_of_lines--;
    cout<<"number of lines in text file: " << number_of_lines << endl;
}

Output: 查看输出

This loop:

        for (int i = 0; i < line.length()/searchString.length(); i++)
        {
            char *output = NULL;
            output = strstr (c,d);
            if(output) {
                count1++;
            }
            else count1--;
        }

is not counting all the matches of the string in the line, because c and d are the same each time you call strstr() . When you repeat the search, you have to start from somewhere after the previous match.

There's also no reason to subtract from count1 when you don't find a match. You should just exit the loop when that happens. And there's little point in using a for loop, because you're not doing anything with i ; just use a while loop.

        char *start = c;
        size_t searchlen = searchString.length();
        while (true)
        {
            char *output = strstr (start,d);
            if(output) {
                count1++;
                start = output + searchlen;
            } else {
                break;
            }
        }

You don't need to read in the entire file into an array or std::string . I recommend you keep this program simple before optimizing.

As noted in your question, you are required to use character arrays and to read line by line.

Look up theistream::getline function as it will be very useful.

Let's declare a maximum line length of 1024.

Here's the reading file part:

#define MAX_LINE_LENGTH (1024)
char text_buffer[MAX_LINE_LENGTH]; // Look, no "new" operator. :-)
//...
while (my_text_file.getline(text_buffer, MAX_LINE_LENGTH, '\n'))
{
 //... TBD
}

The above code fragment reads a line of text into the variable text_buffer .

Because you are using character arrays, please read through the "str" functions in your favorite texts, such as strstr ; or you may have to write your own.

The next step would be to extract a "word" from the text line.

In order to extract a word, we need to know where it starts and where it ends. So, the text line will need to be searched. See the isalpha funciton as it will be useful.

Here's a loop for finding the beginning and ending of a word:

unsigned int word_start_position = 0; // start at beginning of the line.
unsigned int word_end_position = 0;
const unsigned int length = strlen(text_buffer); // Calculate only once.
while (word_start_position < length)
{
  // Find the start of a word.
  while (!isalpha(text_buffer[word_start_position]))
  {
    ++word_start_position;
  }

  // Find end of the word.
  word_end_position = word_start_position;
  while (isalpha(text_buffer[word_end_position]))
  {
    ++word_end_position;
  }
}

There some logic issues remaining in the above code fragments for the OP to resolve.

The next part would be to add code that uses the start and end position of the word to copy the characters in the word to another variable. This variable would then be used in a map or associative array or dictionary which contains the number of occurrences.

In other terms, search the container for the word. If the word exists, increment the associated occurrence variable. If it doesn't exist, add the word to the container with an occurrence of 1.

Good Luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM