简体   繁体   English

EOF后读取文件

[英]Read file after EOF

Is possible to read a file after its EOF? EOF之后可以读取文件吗?

I am reading a file which could contain an EOF character before its ending or multiple EOF characters. 我正在读取一个文件,该文件的结尾可能包含EOF字符或多个EOF字符。 The file is a simple txt, and I am able to know the number of characters using fsize but looks like getc returns EOF (or -1) from the EOF to the end of the file. 该文件是一个简单的txt文件,我可以使用fsize知道字符数,但看起来getc从EOF到文件末尾返回EOF(或-1)。

int c = 0;
char x;
FILE *file = fopen("MyTextFile.txt", "r");
off_t size = fsize("MyTextFile.txt");

while (c < size) {
    x = getc(file);
    if (x != -1)
        printf("%c ", x);
    else
        printf("\nFOUND EOF!\n");
    c++;
}
fclose(file);

Unfortunately, even if I'm sure the file content continues after the EOF I cannot read the rest. 不幸的是,即使我确定文件内容在EOF之后仍然继续,我也无法读取其余内容。

SOLVED: Reading using "rb" instead of "r" and using x as int allowed me to read the whole file, including multiple EOF. 求助:使用“ rb”而不是“ r”进行读取以及将x用作int允许我读取整个文件,包括多个EOF。 Not sure if it's a trick or if it's something allowed, but works. 不知道这是一个花招还是允许的东西,但是可以用。

Logically, there is no data after EOF (end of file). 逻辑上,EOF(文件末尾)之后没有数据。

Note that EOF is not a character; 请注意, EOF不是字符;它不是字符。 it's a special value returned by getc() after an end-of-file or error condition has been encountered, a value returned instead of a character value. 这是遇到文件结尾或错误条件后getc()返回的特殊值,它是返回的值而不是字符值。

You haven't said so in the question, but my guess is that you have a Windows text file with one or more embedded Ctrl-Z ( 0x1a ) characters. 您在问题中没有这么说,但是我猜测您有一个带有一个或多个嵌入式Ctrl-Z( 0x1a )字符的Windows文本文件。 That's the only thing I can think of that's consistent with your description. 这是我唯一能想到的与您的描述一致的东西。

In Windows, a Ctrl-Z character in a text file is treated as the end of the file. 在Windows中,文本文件中的Ctrl-Z字符被视为文件的结尾。 (This goes back to earlier systems where the end of the data was not clearly marked, because the file system only recorded the number of blocks.) Ctrl-Z is not an EOF character; (这可以追溯到早期的系统,在该系统中没有清楚地标记数据的结尾,因为文件系统仅记录了块数。)Ctrl-Z不是EOF字符;它不是EOF字符。 it's a character value that, on Windows, triggers and end-of-file condition and causes getc() to return EOF . 它是一个字符值,在Windows上会触发该文件并触发文件结束条件,并导致getc()返回EOF

Basically you have a malformed text file, and you should probably just fix it and/or fix whatever generated it. 基本上,您有一个格式错误的文本文件,您可能应该修复它和/或修复生成它的任何内容。 But if you really need to read data from it, I suggest opening it in binary mode rather than text mode. 但是,如果您真的需要从中读取数据,建议您以二进制模式而不是文本模式打开它。 You'll then see each CR/LF end-of-line marker as two characters ( '\\r' , '\\n' rather than just '\\n' ), and Ctrl-Z ( 0x1a ) is just another byte value. 然后,您将看到每个CR / LF行尾标记为两个字符( '\\r''\\n'而不只是'\\n' ),而Ctrl-Z( 0x1a )只是另一个字节值。 Since you're not really treating the file as text (the "text" ends at the first Ctrl-Z), it makes sense to read it in binary mode. 由于您并没有真正将文件视为文本(“文本”以第一个Ctrl-Z结尾),因此以二进制模式读取它是有意义的。

There are probably tricks you can play to read past the Ctrl-Z in text mode; 在文本模式下,您可能会玩一些技巧,以阅读Ctrl-Z。 for example clearerr() is likely to work. 例如clearerr()可能有效。 But doing that goes beyond what the C standard guarantees -- which may or may not be a problem for you. 但是,这样做超出了C标准所保证的范围–对您而言,这可能不是问题。

Also, you should definitely use the symbol EOF , not the "magic number" -1 . 另外,您绝对应该使用符号EOF 而不是 “幻数” -1 It's not even guaranteed that EOF == -1 , and using the symbol EOF will make your code much clearer. 甚至不能保证EOF == -1 ,使用符号EOF可以使您的代码更加清晰。

Finally, thanks to Mark Plotnick's for pointing out in a comment something I should have noticed myself. 最后,多亏了马克·普洛特尼克(Mark Plotnick)的评论,我本应该注意到自己。 getc() returns an int result; getc()返回一个int结果; you're assigning it to a char object. 您正在将其分配给char对象。 x needs to be of type int , not char . x必须是int类型,而不是char类型。 This is necessary so you can distinguish between the value of EOF and the value of any actual character. 这是必需的,因此您可以区分EOF的值和任何实际字符的值。

Your code is incomplete so it's hard to say what the problem is, but I would suggest: 您的代码不完整,因此很难说出问题所在,但我建议:

  1. Make sure you are opening the file in binary mode "rb" 确保以二进制模式“ rb”打开文件
  2. Make sure x is of type int 确保x的类型为int

Chapter and verse : 章节

7.21 Input/output <stdio.h> 7.21输入/输出<stdio.h>

7.21.1 Introduction 7.21.1简介
... ...
3 The macros are... 3宏是...

EOF 紧急行动

which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file , that is, no more input from a stream; 它扩展为整数类型为int且为负值的整数常量表达式,该表达式由多个函数返回以指示文件结束 ,即不再有来自流的输入;

EOF isn't a character in the file itself; EOF不是文件本身中的字符; it's a value returned by the input function to indicate that there is no more input available on the stream; 它是输入函数返回的值,指示流上没有更多可用的输入; you can't read past it, because there's nothing to read. 您无法阅读过去,因为没有什么可以阅读。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM