简体   繁体   English

unistd.h read()函数:如何逐行读取文件?

[英]unistd.h read() function: How to read a file line by line?

What I need to do is use the read function from unistd.h to read a file line by line. 我需要做的是使用unistd.h中的read函数逐行读取文件。 I have this at the moment: 我现在有这个:

n = read(fd, str, size);

However, this reads to the end of the file, or up to size number of bytes. 但是,这将读取到文件的末尾,或最大为字节数。 Is there a way that I can make it read one line at a time, stopping at a newline? 有没有办法让我一次读一行,停在新行? The lines are all of variable length. 这些线都是可变长度的。

I am allowed only these two header files: 我只允许这两个头文件:

#include <unistd.h>
#include <fcntl.h>

The point of the exercise is to read in a file line by line, and output each line as it's read in. Basically, to mimic the fgets() and fputs() functions. 练习的目的是逐行读取文件,并在读入时输出每一行。基本上,模仿fgets()和fputs()函数。

您可以逐个字符地读取缓冲区并检查换行符号(Windows为\\r\\n \\n ,Unix系统为\\n )。

You'll want to create a buffer twice the length of your longest line you'll support, and you'll need to keep track of your buffer state. 您需要创建两倍于您支持的最长行的长度的缓冲区,并且需要跟踪缓冲区状态。

Basically, each time you're called for a new line you'll scan from your current buffer position looking for an end-of-line marker. 基本上,每次调用新行时,您都将从当前缓冲区位置进行扫描,以查找行尾标记。 If you find one, good, that's your line. 如果你找到一个好的,那就是你的路线。 Update your buffer pointers and return. 更新缓冲区指针并返回。

If you hit your maxlength then you return a truncated line and change your state to discard. 如果达到maxlength,则返回截断的行并将状态更改为discard。 Next time you're called you need to discard up to the next end of line, and then enter your normal read state. 下次调用时,您需要丢弃到下一行,然后输入正常读取状态。

If you hit the end of what you've read in, then you need to read in another maxline chars, wrapping to the start of the buffer if you hit the bottom (ie, you may need to make two read calls) and then continue scanning. 如果你读到你所读到的内容的末尾,那么你需要读入另一个maxline字符,如果你触到底部包裹到缓冲区的开头(即,你可能需要进行两次读取调用),然后继续扫描。

All of the above assumes you can set a max line length. 以上所有假设您可以设置最大行长度。 If you can't then you have to work with dynamic memory and worry about what happens if a buffer malloc fails. 如果你不能那么你必须使用动态内存并担心如果缓冲区malloc失败会发生什么。 Also, you'll need to always check the results of the read in case you've hit the end of the file while reading into your buffer. 此外,如果在读入缓冲区时遇到文件末尾,则需要始终检查读取结果。

Unfortunately the read function isn't really suitable for this sort of input. 不幸的是,read函数并不适合这种输入。 Assuming this is some sort of artificial requirement from interview/homework/exercise, you can attempt to simulate line-based input by reading the file in chunks and splitting it on the newline character yourself, maintaining state in some way between calls. 假设这是面试/家庭作业/练习中的某种人为要求,您可以尝试通过以块为单位读取文件并在换行符上自行分割来模拟基于行的输入,在调用之间以某种方式维持状态。 You can get away with a static position indicator if you carefully document the function's use. 如果您仔细记录功能的使用,您可以使用静态位置指示器。

If you need to read exactly 1 line (and not overstep) using read() , the only generally-applicable way to do that is by reading 1 byte at a time and looping until you get a newline byte. 如果你需要使用read()准确读取1行(而不是超越),那么唯一通常适用的方法是一次读取1个字节并循环直到获得换行字节。 However, if your file descriptor refers to a terminal and it's in the default (canonical) mode, read will wait for a newline and return less than the requested size as soon as a line is available. 但是,如果您的文件描述符引用了终端并且它处于默认(规范)模式,则只要行可用,读取将等待换行并返回小于请求的大小。 It may however return more than one line, if data arrives very quickly, or less than 1 line if your program's buffer or the internal terminal buffer is shorter than the line length. 但是,如果数据非常快地到达,它可能返回多行,如果程序的缓冲区或内部终端缓冲区短于行长度,则可能返回少于1行。

Unless you really need to avoid overstep (which is sometimes important, if you want another process/program to inherit the file descriptor and be able to pick up reading where you left off), I would suggest using stdio functions or your own buffering system. 除非你真的需要避免超越(有时候这很重要,如果你想让另一个进程/程序继承文件描述符并且能够从你中断的位置读取),我建议使用stdio函数或你自己的缓冲系统。 Using read for line-based or byte-by-byte IO is very painful and hard to get right. 使用read作为基于行或逐字节的IO是非常痛苦的,很难做到正确。

This is a good question, but allowing only the read function doesn't help! 这是一个很好的问题,但只允许读取功能无济于事! :P :P

Loop read calls to get a fixed number of bytes, and search the '\\n' character, then return a part of the string (untill '\\n'), and stores the rest (except '\\n') to prepend to the next character file chunk. 循环读取调用以获取固定数量的字节,并搜索'\\ n'字符,然后返回字符串的一部分(直到'\\ n'),并将其余部分(除了'\\ n')存储到前置到下一个字符文件块。

Use dynamic memory. 使用动态内存。

Greater the size of the buffer, less read calls used (which is a system call, so no cheap but nowadays there are preemptive kernels). 缓冲区的大小越大,使用的读取调用越少(系统调用,所以没有便宜但现在有抢占式内核)。

... ...

Or simply fix a maximum line length, and use fgets, if you need to be quick... 或者只是修改一个最大行长度,并使用fgets,如果你需要快速...

If you open the file in text mode then Windows "\\r\\n" will be silently translated to "\\n" as the file is read. 如果以文本模式打开文件,则在读取文件时,Windows“\\ r \\ n”将以静默方式转换为“\\ n”。

If you are on Unix you can use the non-standard 1 gcc 'getline()' function. 如果你在Unix上,你可以使用非标准的1 gcc'getline()'函数。


1 The getline() function is standard in POSIX 2008. 1 getline()函数是POSIX 2008的标准函数。

Well, it will read line-by-line from a terminal. 那么,它将从终端逐行读取。

Some choices you have are: 你有一些选择:

  • Write a function that uses read when it runs out of data but only returns one line at a time to the caller 编写一个在数据用完时使用read但只能一次向调用者返回一行的函数
  • Use the function in the library that does exactly that: fgets() . 使用库中完全相同的函数: fgets()
  • Read only one byte at a time, so you don't go too far. 一次只读一个字节,所以你不要走得太远。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM