简体   繁体   English

C的读取(从标准输入)停止在0x1a字符处

[英]C reading (from stdin) stops at 0x1a character

currently I'm implementing the Burrows-Wheeler transform (and inverse transform) for raw data (like jpg etc.). 目前,我正在为原始数据(如jpg等)实现Burrows-Wheeler变换(和逆变换)。 When testing on normal data like textfiles no problems occur. 在测试诸如文本文件之类的普通数据时,不会发生任何问题。 But when it comes to reading jpg files eg it stops reading at character 0x1a aka substitute character. 但是,当涉及到读取jpg文件时,例如,它停止读取字符0x1a(也称为替代字符)。 I've been searching through the internet for solutions which doesn't take OS dependend code but without results... I was thinking to read in stdin in binary mode but that isn't quite easy I guess. 我一直在互联网上寻找不带OS依赖代码但没有结果的解决方案...我想以二进制模式在stdin中读取,但是我猜这并不容易。 Is there any simple method to solve this problem? 有没有简单的方法可以解决这个问题?

code: 码:

buffer = (unsigned char*) calloc(block_size+1,sizeof(unsigned char));
length = fread((unsigned char*) buffer, 1, block_size, stdin);
if(length == 0){
    // file is empty
}else{
    b_length = length;
    while(length == b_length){
        buffer[block_size] = '\0';
        encodeBlock(buffer,length);
        length = fread((unsigned char*) buffer, 1, block_size, stdin);      
    }
    if(length != 0){            
        buffer[length] = '\0';
        encodeBlock(buffer,length);
    }
}
free(buffer);

As you've noticed, you're reading from stdin in ASCII mode and it is hitting the SUB character (substitute, aka CTRL + Z , aka DOS End-of-File). 正如您所注意到的,您正在以ASCII模式从stdin进行读取,并且它正在击中SUB字符(替代,又名CTRL + Z ,又名DOS文件末尾)。

You have to change the mode to binary with setmode while on Windows: 在Windows上,必须使用setmode将模式更改为二进制:

#if defined(WIN32)
#include <io.h>
#include <fcntl.h>
#endif /* defined(WIN32) */

/* ... */

#if defined(WIN32)
_setmode(_fileno(stdin), _O_BINARY);
#endif /* defined(WIN32) */

On platforms other than Windows you don't run into this distinction in modes. 在Windows以外的平台上,您不会在模式上遇到这种区别。

You cannot do this without an OS dependency. 没有操作系统依赖性,您将无法执行此操作。 The C language specification says (7.19.3) C语言规范说(7.19.3)

At program startup, three text streams are predefined... 在程序启动时,预定义了三个文本流。

stdin is a text stream. stdin是文本流。 Depending on your OS, there may be ways to change the mode of an existing stream or access the low-level stream data, but you claim that you do not want any OS-specific code. 根据您的操作系统,可以使用多种方法来更改现有流的模式或访问低级流数据,但是您声称不需要任何特定于操作系统的代码。

You must open the file as a binary file. 您必须将文件作为二进制文件打开。

Use something similar to 使用类似于

fopen("file", "rb");

You can use _setmode to convert stdin to binary mode. 您可以使用_setmode将stdin转换为二进制模式。

There is also freopen -- see this SO question 还有freopen看到这个问题

Use read() to read in the data. 使用read()读取数据。
Since you are interested in getting data from the stdin , use 由于您有兴趣从stdin获取数据,请使用

fd = fcntl(STDIN_FILENO, F_DUPFD, 0);

to obtain the fd of stdin . 获得stdinfd

More info here . 更多信息在这里

The issue has something to do with the fact that windows treats 0x1a aka CTRL+Z as the EOF . 该问题与Windows将0x1a aka CTRL + Z视为EOF的事实有关。 As Earlz pointed out, opening it in binary mode fixes this on windows and works on linux too. 正如Earlz指出的那样,以二进制模式打开它可以在Windows上解决此问题,也可以在Linux上工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM