简体   繁体   中英

C - Strtok() , split the string on '\n' but keep the delimiter

I have the following problem with my C program. Part of it's functionality is to read some text and split it into sentences then write those sentences in a file.

I used Strtok() to split the chunk of text in sentences (a sentence ends when \\n occurs) however when there is a sentence that just contains the \\n character in a chunk of text like :

//////////////////////////////

Hello, this is some sample text
This is the second sentence

The sentence above is just a new line
This is the last sentence.

/////////////////////////////

The output of the file is as follows :

0 Hello, this is some sample text
1 This is the second sentence
2 The sentence above is just a new line
3 This is the last sentence.

////////////////////////////////////////////////////

While it should be :

0 Hello, this is some sample text
1 This is the second sentence
2
3 The sentence above is just \\n
4 This is the last sentence.

////////////////////////////////////

The file holding the strings should function as a log file that's why I have to split the chunk of text in sentences split at \\n and before writing each sentence into the file have an integer in front.

This is the code related to this functionality :

int counter = 0; // Used for counting
const char s[2] = "\n"; // Used for tokenization

// ............

char *token;
      token = strtok(input,s);
      while(token != NULL){
        fprintf(logs, "%d ", counter);
        fprintf(logs, "%s\n" , token); // Add the new line character here since it is removed from the tokenization process
        counter++;
        token = strtok(NULL, s);
      }

// .........

Is there a way to have a special case for when an "empty sentence" (a sentence that is just a \\n character) to handle it properly?

Perhaps another function would work instead of strtok()?

You should probably use strstr or strchr as the comment suggests, but if your application requires strtok for some reason, you could save off the position of the end of each sentence and determine that multiple newlines ( \\n ) occurred sequentially with pointer arithmetic.

rough untested example code:

int counter = 0; // Used for counting
const char* last_sentence;


// ............
      last_sentence = input;
      char *token;
      token = strtok(input,"\n");
      while(token != NULL){
        int i;
        for (i = (token - last_sentence);i > 1; i--){
          // this gets called once for each empty line.
          fprintf(logs, "%d \n", counter++);
        }
        fprintf(logs, "%d %s\n", counter++, token);

        last_sentence = token + strlen(token);
        token = strtok(NULL, "\n");
      }

// .........

EDIT: added example with strchr

Using strchr is just as easy, if not easier especially since you only have one delimiter. The code below takes your sentences, and splits them out. It just prints them, but you could easily extend it for your purposes.

#include <stdio.h>
#include <string.h>
const char* sentences = "Hello, this is some sample text\n"
                        "This is the second sentence\n"
                        "\n"
                        "The sentence above is just a new line\n"
                        "This is the last sentence.\n";

void parse(const char* input){
  char *start, *end;
  unsigned count = 0;

  // the cast to (char*) is because i'm going to change the pointer, not because i'm going to change the value.
  start = end = (char*) input; 

  while( (end = strchr(start, '\n')) ){
      printf("%d %.*s", count++, (int)(end - start + 1), start);
      start = end + 1;
  }
}

int main(void){
  parse(sentences);
}

If you are reading your input from a file, you can use a stream (with fopen() ) and use getline() .

Else you can code a function which count the number of \\n , allocate an array of char* , and fill it line by line.

EDIT: if you don't want to code it yourself, you can find it easily with some small research

You included the newline \\n in the delimiter set for strtok .

If the input string is a valid read, and if the first call to strtok returns NULL , then it was a blank line which you can then process.

token = strtok(input,s);
if(token == NULL) {
    fprintf(logs, "%d\n", counter);
    counter++;
}
while(token != NULL){                   // the `while` serves as `else`
    fprintf(logs, "%d ", counter);
    fprintf(logs, "%s\n" , token);
    counter++;
    token = strtok(NULL, s);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM