简体   繁体   中英

How do you read a large file with a small buffer in assembly with system calls? Does read append a \0?

Calling the read syscall for a file larger than the size of my buffer will mean the buffer will only capture the first part of the file. Trying to call it again will have no effect, it still only gives first part of the file. Say the file is 1 GB and the buffer is 1024 bytes, then we'll only ever access the first 1024 bytes of the large file. Is there any way to access the rest of the file without increasing the buffer size?

I couldn't find any flag talking about this when you open the file on this website: https://linuxhint.com/list_of_linux_syscalls/#open-flags (unless I misunderstood the descriptions).

I initially thought that the computer would fill the second 1024 bytes when I syscalled for the second time (like it is in C IIRC). Well, really I had a text file size of ~1300B and a buffer size of 512B, so it isn't an issue for me to resize in this case, but I wanted to know how it was dealt with in general.

Is there some kind of other syscall to break the file into pieces or to make it into some kind of stream-like object? I know there's a bash split command. How do C and my OS deal with files like this? C has an option to eat a file with one bite at a time, are they really using a very large buffer underneath? It feels wasteful to be forced to have the full file copied into a separate buffer and I would be surprised if there was no alternative.

EDIT: Sorry, It turns out there was no problem with any syscall. what happened was that I expected there to be a null byte or some other special character to signify the end of the file and I used that to check when I should stop refilling and printing my buffer, It turns out there wasn't for some reason and what would happen is that the syscall would only change until the end of the file in the buffer and leave the rest of the buffer the same, so when I printed it it looked like it was looping itself and at the end I would see part of it wasn't finished. when in reality it did finish but there was some repeat text from the previous buffer refill after. The book I was reading (Programming from the Ground Up) said the syscall would also add a \0 at the end so I can check for that. It was about 32-bit assembly so the syscall might have changed. [Edit 2: Sorry, Turns out I misread the book. see answer,] Now I'm using the return value of the syscall, which is the length of file the system changed in the buffer. in order to check when to stop and to print without repeating parts of the previous buffer.

tl;dr - misunderstood a syscall

What happened was that I first misread the following about reading lines from Programming from the Ground Up and accidently replaced line with file in my head:

For an example, let's say that you want to read in a single line of text from a file but you do not know how long that line is. You would then simply read a large number of bytes/characters from the file into a buffer, look for the end-of-line character, and copy all of the characters to that end-of-line character to another location. If you didn't find an end-of-line character, you would allocate another buffer and continue reading. You would probably wind up with some characters left over in your buffer in this case, which you would use as the starting point when you next need data from the file.

When in reality a few paragraphs before it stated that:

The write system call will give back the number of bytes written in %eax or an error code.

Without mentioning anything about null bytes. If I had read the program, I would have also realised my mistake. Or if I had increased my buffer size to larger than the file's, I think.

For what happened in my code: I expected there to be a null byte or some other special character to signify the end of the file and I used that to check when I should stop refilling and printing my buffer. The syscall would only change until the end of the file in the buffer and leave the rest of the buffer the same, so when I printed it it would never stop and at the end of each buffer write I would see part of it wasn't finished, when in reality it did finish but there was some repeat text from the previous buffer refill after.

Well, technically I realise now the buffer only gets refilled once at the end, after that the reads don't change the buffer at all and I'm just rewriting that last buffer until I stop the program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM