简体   繁体   中英

Calculating number of a specific substring from a string in C

I have a string where I put all the characters read from a considerably large (500MB) text file. How I read the .txt file is given below.

    long fsize(FILE *fp) {
    fseek(fp, 0, SEEK_END);
    long bytes = ftell(fp);
    rewind(fp);
    return bytes;
}
char *filecontent='\0';
FILE *fp;
fp=fopen(file.txt,"r");
long size = fsize(fp);
fcontent = malloc(size);
fread(fcontent, 1, size, fp);     

`

fcontent points to the string which should be in following format :

matrix
trivial
bigbash
tropical
swalloed
.
.
.

Now I need to count the number occurrence of a substring say 'ba' from fcontent . As each line in the text file contains a single word and the substring search should be limited to that word only, How do I only select matrix, trivial, bigbash ... one word at a time from the fcontent ?

Here's an algorithm for you:

  1. Have a current pointer. Initialize it to point to the beginning of the string.
  2. Search from the current pointer for the first end of line character.
  3. If you run off the end of the string, stop, you are done.
  4. Convert the end of line character to a zero byte.
  5. Process the string beginning at the current pointer.
  6. Set the current pointer equal to point to the end of line character your replaced with a zero byte.
  7. Restore the end of line character at the current pointer so that you don't damage the string (unless you don't care).
  8. Keep incrementing the current pointer until it points to something other than an end of line character. If you hit a zero byte, stop, you are done.
  9. Go to step 2.

Your file consists of one word per line. You read the entire file in then seek to break the resulting string by linebreak.

The far easier process is to read the file line by line using getline().

Then use strstr to search for your substring in each word.

http://www.cplusplus.com/reference/string/string/getline/?kw=getline
http://www.cplusplus.com/reference/cstring/strstr/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM