简体   繁体   中英

Is there anything like “Beginning of file” in C?

I need to make a "text processor" or "translator" (a VERY little and basic one). Its input is a text file with any text and I have to find roman numerals and replace them with another word. For example, in this text:

Hello I am a text sayIIng stVffI

The output should be:

Hello REPLACED_WORD am a text sayIIng stVffI

Meaning I need to replace only the words that are numerals, and not things that have numerals in them, for example sayIIng. It has II, but it's not just II so it doesn't "translate" it.

I'm doing this using fgets() to read the file, then strtok() to separate the words of each line, and strpbrk() to find every occurence of some string into another. My problem is if I have a correct roman numeral in the first letter of a text file, it doesn't recognize it 'cause the conditions are that the numeral has either a space, or a \\n or a \\t or NULL before... and it seems that none of those are before the numeral in the text file.

These are the functions that validate that the numerals are separate from the rest of the text/sentence.

int validate_before(char *token){
    char *tk = token-1;
    if(tk[0] == ' ' || tk[0] == '\t' || tk[0] == '\n' || tk[0] == 'NULL'){
        return 1;
    }
    return 0;
}

int validate_after(char *token){
    char *tk = token+1;
    if(tk[0] == ' ' || tk[0] == '\t' || tk[0] == '\0' || tk[0] == EOF){
        return 1;
    }
    return 0;
}

Then I have a function that reads the file and tries to find each occurence (still I'm trying to figuring out how to re-write the whole text with the words replaces but first things first).

while(fgets(buffer, 1024, archivo_entrada)){
            aux = strtok(buffer, "\n");
            while(aux != NULL){
                char *primera = strpbrk(aux, "I");

                printf("%s @@ \n", aux);
                while(primera != NULL){
                    if(primera && (validate_before(primera) == 1) && (validate_after(primera) == 1)){
                        printf("--FOUND AN "I"--\n");
                    }
                    primera = strpbrk(primera+1, "I");
                }
             aux = strtok(NULL, "\n");
            }
        }

Thanks.

First, you cannot work in place, since files are byte sequences which can only be appended or truncated (at their end) - there is no elementary way to replace an internal sequence (inside, not at end, of some file) of bytes by another one (of different size).

So you need to read an input file, and write an output file (perhaps rename(2) -ing it to the input if your want a program that replaces a file).

So you probably want to process the input file line by line; either with fgets (if you accept having a fixed line buffer which limits the widest line) or with getline(3) , at least on POSIX systems (it can accept arbitrary long lines fitting in memory)

To parse the line, you could parse it by hand (not using strtok ), coding a simple finite state automaton (if you wanted to, you could generate that automaton with flex , but that is overkill). Alternatively, consider using regex(3) (probably an overkill in your case)

BTW, beginning of file (not of line) condition is, using ftell(3) , just ftell( file ) == 0L

There is no beginning of line condition or character. It is just when aux == buffer (with aux being the current character pointer, and buf being the line buffer)

Notice that strtok(3) is stateful. If you insist using it, you should reset it by passing it a non NULL buffer the first time (at start of line), and passing it NULL the other times

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM