简体   繁体   中英

Regarding strings in printf

From this post https://stackoverflow.com/a/22059317/5859944

FILE *fileptr;
char *buffer;
long filelen;

fileptr = fopen("myfile.txt", "rb");  // Open the file in binary mode
fseek(fileptr, 0, SEEK_END);          // Jump to the end of the file
filelen = ftell(fileptr);             // Get the current byte offset 
                                         in the file
rewind(fileptr);                      // Jump back to the beginning of 
                                         the file

buffer = (char *)malloc((filelen+1)*sizeof(char)); // Enough memory 
                                                      for file + \0
fread(buffer, filelen, 1, fileptr); // Read in the entire file
fclose(fileptr); // Close the file

You read a file as a byte array.

As it is clear that it is nothing but a string. One could add a

buffer[filelen] = '\0';

and

printf("%s" , buffer);

should print the entrire contents of a file as if it were a string. And it does so in simple text files but not in binary ones.

For binary files , one has to write a function as follows:

void traverse(char *string ,size_t size){
  for(size_t i=0;i<size;i++)
    printf("%c",string[i]);
}

and print each character one by one. Which puts gibberish on the screen.

  • Why printf doesn't treat buffer as a character string incase of binary files?
  • Why printf in function traverse puts gibberish instead of characters?

For the second bullet point, I know that it maybe due to signed char but even if the string is stored as unsigned char results are still the same.

Why printf doesn't treat buffer as a character string incase of binary files?

printf("%s",buffer) does assume that buffer contains a C string. The problem is, that's not what buffer actually contains. Your buffer actually contains the bytes from a binary file.

AC string is terminated by the first zero byte (aka, '\\0', aka, ASCII NUL), and an arbitrary non-text file (aka, "binary file") could contain zero bytes anywhere within itself. The printf "%s" format will stop as soon as it sees the first zero byte.

Why printf in function traverse puts gibberish instead of characters?

That "gibberish" is characters. But the characters don't mean anything because the file that you are trying to interpret as a sequence of characters was not meant to be interpreted in that way.

A text file is a sequence of bytes that are meant to be rendered according to some character encoding , and which usually is meant to convey some kind of human-readable information. An arbitrary "binary" file probably contains a sequence of bytes that are not supposed to represent human readable text. They represent something else, that some computer program understands.

The printf "%s" attempts to render those bytes according to some character encoding system (often either UTF-8 or US-ASCII ), but (a) it won't always be able to do so, and (b) even when it is able to do so, the character sequence won't mean anything.

The first problem here is you are mixing the purposing of how your reading the file. To read binary data from a file, and subsequently print that binary data, you should always use unsigned char and unsigned char* to store data read from a binary file.

This is because in order to get an accurate representation of that data printed, you need to use the "%u" type flag in format strings to printf . Only some characters are printable according to ASCII, meaning if you desire to see all the binary data in your file, you will not be able to see some of it printed if you use the "%c" flag.

A binary version of your function would be:

void traverse(unsigned char *string ,size_t size){
  for(size_t i=0;i<size;i++)
    printf("%u",string[i]);
}

The second issue is

should print the entrire contents of a file as if it were a string. And it does so in simple text files but not in binary ones.

This is not true. Using the "%s" flag prints characters as if they were printable chars, it will terminate when it reaches a null, \\0 character. You cannot efficiently use %s to print binary data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM