简体   繁体   English

如何将文件逐行读入列表?

[英]How to read a file line-by-line into a list?

How do I read every line of a file in Python and store each line as an element in a list?如何读取 Python 中文件的每一行并将每一行作为一个元素存储在列表中?

I want to read the file line by line and append each line to the end of the list.我想逐行读取文件并将 append 每行读取到列表末尾。

This code will read the entire file into memory and remove all whitespace characters (newlines and spaces) from the end of each line:此代码会将整个文件读入内存并从每行末尾删除所有空白字符(换行符和空格):

with open(filename) as file:
    lines = file.readlines()
    lines = [line.rstrip() for line in lines]

If you're working with a large file, then you should instead read and process it line-by-line:如果您正在处理一个大文件,那么您应该逐行读取和处理它:

with open(filename) as file:
    for line in file:
        print(line.rstrip())

In Python 3.8 and up you can use a while loop with the walrus operator like so:在 Python 3.8 及更高版本中,您可以使用带有海象运算符的 while 循环,如下所示:

with open(filename) as file:
    while (line := file.readline().rstrip()):
        print(line)

Depending on what you plan to do with your file and how it was encoded, you may also want to manually set the access mode and character encoding:根据您计划对文件执行的操作以及文件的编码方式,您可能还需要手动设置访问模式和字符编码:

with open(filename, 'r', encoding='UTF-8') as file:
    while (line := file.readline().rstrip()):
        print(line)

See Input and Ouput :请参阅输入和输出

with open('filename') as f:
    lines = f.readlines()

or with stripping the newline character:或剥离换行符:

with open('filename') as f:
    lines = [line.rstrip() for line in f]

This is more explicit than necessary, but does what you want.这比必要的更明确,但可以满足您的要求。

with open("file.txt") as file_in:
    lines = []
    for line in file_in:
        lines.append(line)

This will yield an "array" of lines from the file.这将从文件中产生一个“数组”行。

lines = tuple(open(filename, 'r'))

open returns a file which can be iterated over. open返回一个可以迭代的文件。 When you iterate over a file, you get the lines from that file.当您遍历文件时,您会从该文件中获取行。 tuple can take an iterator and instantiate a tuple instance for you from the iterator that you give it. tuple可以接受一个迭代器,并从你给它的迭代器中为你实例化一个元组实例。 lines is a tuple created from the lines of the file. lines是从文件的行创建的元组。

According to Python's Methods of File Objects , the simplest way to convert a text file into a list is:根据 Python 的文件对象方法,将文本文件转换为list的最简单方法是:

with open('file.txt') as f:
    my_list = list(f)
    # my_list = [x.rstrip() for x in f] # remove line breaks

If you just need to iterate over the text file lines, you can use:如果您只需要遍历文本文件行,您可以使用:

with open('file.txt') as f:
    for line in f:
       ...

Old answer:老答案:

Using with and readlines() :使用withreadlines()

with open('file.txt') as f:
    lines = f.readlines()

If you don't care about closing the file, this one-liner will work:如果您不关心关闭文件,则此单行代码将起作用:

lines = open('file.txt').readlines()

The traditional way:传统方式:

f = open('file.txt') # Open file on read mode
lines = f.read().splitlines() # List with stripped line-breaks
f.close() # Close file

If you want the \n included:如果您希望包含\n

with open(fname) as f:
    content = f.readlines()

If you do not want \n included:如果您不希望包含\n

with open(fname) as f:
    content = f.read().splitlines()

You could simply do the following, as has been suggested:正如建议的那样,您可以简单地执行以下操作:

with open('/your/path/file') as f:
    my_lines = f.readlines()

Note that this approach has 2 downsides:请注意,这种方法有两个缺点:

1) You store all the lines in memory. 1)您将所有行存储在内存中。 In the general case, this is a very bad idea.在一般情况下,这是一个非常糟糕的主意。 The file could be very large, and you could run out of memory.该文件可能非常大,您可能会耗尽内存。 Even if it's not large, it is simply a waste of memory.就算不大,也只是浪费内存而已。

2) This does not allow processing of each line as you read them. 2)这不允许在您阅读它们时处理每一行。 So if you process your lines after this, it is not efficient (requires two passes rather than one).因此,如果您在此之后处理您的行,则效率不高(需要两次而不是一次)。

A better approach for the general case would be the following:对于一般情况,更好的方法如下:

with open('/your/path/file') as f:
    for line in f:
        process(line)

Where you define your process function any way you want.您可以以任何方式定义您的流程功能。 For example:例如:

def process(line):
    if 'save the world' in line.lower():
         superman.save_the_world()

(The implementation of the Superman class is left as an exercise for you). Superman人类的实现留给你做练习)。

This will work nicely for any file size and you go through your file in just 1 pass.这适用于任何文件大小,您只需 1 次即可完成文件。 This is typically how generic parsers will work.这通常是通用解析器的工作方式。

Having a Text file content:具有文本文件内容:

line 1
line 2
line 3

We can use this Python script in the same directory of the txt above我们可以在上面的txt同目录下使用这个Python脚本

>>> with open("myfile.txt", encoding="utf-8") as file:
...     x = [l.rstrip("\n") for l in file]
>>> x
['line 1','line 2','line 3']

Using append:使用附加:

x = []
with open("myfile.txt") as file:
    for l in file:
        x.append(l.strip())

Or:或者:

>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']

Or:或者:

>>> x = open("myfile.txt").readlines()
>>> x
['linea 1\n', 'line 2\n', 'line 3\n']

Or:或者:

def print_output(lines_in_textfile):
    print("lines_in_textfile =", lines_in_textfile)

y = [x.rstrip() for x in open("001.txt")]
print_output(y)

with open('001.txt', 'r', encoding='utf-8') as file:
    file = file.read().splitlines()
    print_output(file)

with open('001.txt', 'r', encoding='utf-8') as file:
    file = [x.rstrip("\n") for x in file]
    print_output(file)

output:输出:

lines_in_textfile = ['line 1', 'line 2', 'line 3']
lines_in_textfile = ['line 1', 'line 2', 'line 3']
lines_in_textfile = ['line 1', 'line 2', 'line 3']

To read a file into a list you need to do three things:要将文件读入列表,您需要做三件事:

  • Open the file打开文件
  • Read the file读取文件
  • Store the contents as list将内容存储为列表

Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:幸运的是,Python 使这些事情变得非常容易,因此将文件读入列表的最短方法是:

lst = list(open(filename))

However I'll add some more explanation.但是,我将添加更多解释。

Opening the file打开文件

I assume that you want to open a specific file and you don't deal directly with a file-handle (or a file-like-handle).我假设您想打开一个特定的文件并且您不直接处理文件句柄(或类似文件的句柄)。 The most commonly used function to open a file in Python is open , it takes one mandatory argument and two optional ones in Python 2.7:在 Python 中打开文件最常用的函数是open ,在 Python 2.7 中它需要一个强制参数和两个可选参数:

  • Filename文件名
  • Mode模式
  • Buffering (I'll ignore this argument in this answer)缓冲(我将在这个答案中忽略这个论点)

The filename should be a string that represents the path to the file .文件名应该是代表文件路径的字符串。 For example:例如:

open('afile')   # opens the file named afile in the current working directory
open('adir/afile')            # relative path (relative to the current working directory)
open('C:/users/aname/afile')  # absolute path (windows)
open('/usr/local/afile')      # absolute path (linux)

Note that the file extension needs to be specified.请注意,需要指定文件扩展名。 This is especially important for Windows users because file extensions like .txt or .doc , etc. are hidden by default when viewed in the explorer.这对于 Windows 用户来说尤其重要,因为在资源管理器中查看时,默认情况下会隐藏.txt.doc等文件扩展名。

The second argument is the mode , it's r by default which means "read-only".第二个参数是mode ,默认是r ,意思是“只读”。 That's exactly what you need in your case.这正是您所需要的。

But in case you actually want to create a file and/or write to a file you'll need a different argument here.但是如果你真的想创建一个文件和/或写入一个文件,你需要一个不同的参数。 There is an excellent answer if you want an overview .如果您想要概览,有一个很好的答案

For reading a file you can omit the mode or pass it in explicitly:要读取文件,您可以省略mode或显式传递它:

open(filename)
open(filename, 'r')

Both will open the file in read-only mode.两者都将以只读模式打开文件。 In case you want to read in a binary file on Windows you need to use the mode rb :如果您想在 Windows 上读取二进制文件,您需要使用模式rb

open(filename, 'rb')

On other platforms the 'b' (binary mode) is simply ignored.在其他平台上, 'b' (二进制模式)被简单地忽略。


Now that I've shown how to open the file, let's talk about the fact that you always need to close it again.现在我已经展示了如何open文件,让我们谈谈您总是需要再次close它的事实。 Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).否则,它将保持文件的打开文件句柄,直到进程退出(或 Python 垃圾文件句柄)。

While you could use:虽然您可以使用:

f = open(filename)
# ... do stuff with f
f.close()

That will fail to close the file when something between open and close throws an exception.openclose之间的某些东西引发异常时,这将无法关闭文件。 You could avoid that by using a try and finally :您可以通过使用tryfinally来避免这种情况:

f = open(filename)
# nothing in between!
try:
    # do stuff with f
finally:
    f.close()

However Python provides context managers that have a prettier syntax (but for open it's almost identical to the try and finally above):然而 Python 提供了具有更漂亮语法的上下文管理器(但对于open它几乎与上面的tryfinally相同):

with open(filename) as f:
    # do stuff with f
# The file is always closed after the with-scope ends.

The last approach is the recommended approach to open a file in Python!最后一种方法是在 Python 中打开文件的推荐方法!

Reading the file读取文件

Okay, you've opened the file, now how to read it?好的,您已经打开了文件,现在如何读取它?

The open function returns a file object and it supports Pythons iteration protocol. open函数返回一个file对象,它支持 Python 的迭代协议。 Each iteration will give you a line:每次迭代都会给你一行:

with open(filename) as f:
    for line in f:
        print(line)

This will print each line of the file.这将打印文件的每一行。 Note however that each line will contain a newline character \n at the end (you might want to check if your Python is built withuniversal newlines support - otherwise you could also have \r\n on Windows or \r on Mac as newlines).但是请注意,每行末尾都将包含一个换行符\n (您可能需要检查您的 Python 是否使用通用换行符支持构建 - 否则您也可以在 Windows 上使用\r\n或在 Mac 上使用\r作为换行符) . If you don't want that you can could simply remove the last character (or the last two characters on Windows):如果您不希望这样,您可以简单地删除最后一个字符(或 Windows 上的最后两个字符):

with open(filename) as f:
    for line in f:
        print(line[:-1])

But the last line doesn't necessarily has a trailing newline, so one shouldn't use that.但最后一行不一定有一个尾随换行符,所以不应该使用它。 One could check if it ends with a trailing newline and if so remove it:可以检查它是否以尾随换行符结尾,如果是,则将其删除:

with open(filename) as f:
    for line in f:
        if line.endswith('\n'):
            line = line[:-1]
        print(line)

But you could simply remove all whitespaces (including the \n character) from the end of the string , this will also remove all other trailing whitespaces so you have to be careful if these are important:但是您可以简单地从字符串末尾删除所有空格(包括\n字符),这也会删除所有其他尾随空格,因此如果这些很重要,您必须小心:

with open(filename) as f:
    for line in f:
        print(f.rstrip())

However if the lines end with \r\n (Windows "newlines") that .rstrip() will also take care of the \r !但是,如果行以\r\n (Windows“换行符”)结尾, .rstrip()也会处理\r

Store the contents as list将内容存储为列表

Now that you know how to open the file and read it, it's time to store the contents in a list.现在您知道如何打开文件并阅读它,是时候将内容存储在列表中了。 The simplest option would be to use the list function:最简单的选择是使用list函数:

with open(filename) as f:
    lst = list(f)

In case you want to strip the trailing newlines you could use a list comprehension instead:如果您想去除尾随换行符,您可以使用列表推导:

with open(filename) as f:
    lst = [line.rstrip() for line in f]

Or even simpler: The .readlines() method of the file object by default returns a list of the lines:甚至更简单: file对象的.readlines()方法默认返回行list

with open(filename) as f:
    lst = f.readlines()

This will also include the trailing newline characters, if you don't want them I would recommend the [line.rstrip() for line in f] approach because it avoids keeping two lists containing all the lines in memory.这还将包括尾随换行符,如果您不想要它们,我会推荐[line.rstrip() for line in f]方法,因为它避免了在内存中保留两个包含所有行的列表。

There's an additional option to get the desired output, however it's rather "suboptimal":read the complete file in a string and then split on newlines:还有一个额外的选项可以获得所需的输出,但是它相当“次优”:read字符串中的完整文件,然后在换行符处拆分:

with open(filename) as f:
    lst = f.read().split('\n')

or:或者:

with open(filename) as f:
    lst = f.read().splitlines()

These take care of the trailing newlines automatically because the split character isn't included.这些会自动处理尾随换行符,因为不包括split字符。 However they are not ideal because you keep the file as string and as a list of lines in memory!但是它们并不理想,因为您将文件保存为字符串和内存中的行列表!

Summary概括

  • Use with open(...) as f when opening files because you don't need to take care of closing the file yourself and it closes the file even if some exception happens.打开文件时使用with open(...) as f ,因为您不需要自己关闭文件,即使发生异常也会关闭文件。
  • file objects support the iteration protocol so reading a file line-by-line is as simple as for line in the_file_object: . file对象支持迭代协议,因此逐行读取文件for line in the_file_object:一样简单。
  • Always browse the documentation for the available functions/classes.始终浏览可用函数/类的文档。 Most of the time there's a perfect match for the task or at least one or two good ones.大多数时候,任务有一个完美的匹配,或者至少有一两个好的匹配。 The obvious choice in this case would be readlines() but if you want to process the lines before storing them in the list I would recommend a simple list-comprehension.在这种情况下,显而易见的选择是readlines()但如果您想在将这些行存储到列表中之前对其进行处理,我建议您使用简单的列表理解。

Clean and Pythonic Way of Reading the Lines of a File Into a List将文件的行读入列表的干净和 Pythonic 方式


First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way.首先,您应该专注于以高效且 Python 的方式打开文件并读取其内容。 Here is an example of the way I personally DO NOT prefer:这是我个人不喜欢的方式的一个例子:

infile = open('my_file.txt', 'r')  # Open the file for reading.

data = infile.read()  # Read the contents of the file.

infile.close()  # Close the file since we're done using it.

Instead, I prefer the below method of opening files for both reading and writing as it is very clean, and does not require an extra step of closing the file once you are done using it.相反,我更喜欢以下打开文件以进行读取和写入的方法,因为它非常干净,并且在完成使用后不需要关闭文件的额外步骤。 In the statement below, we're opening the file for reading, and assigning it to the variable 'infile.'在下面的语句中,我们打开文件进行读取,并将其分配给变量“infile”。 Once the code within this statement has finished running, the file will be automatically closed.一旦该语句中的代码完成运行,该文件将自动关闭。

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible.现在我们需要专注于将这些数据放入Python 列表中,因为它们是可迭代的、高效的和灵活的。 In your case, the desired goal is to bring each line of the text file into a separate element.在您的情况下,所需的目标是将文本文件的每一行放入一个单独的元素中。 To accomplish this, we will use the splitlines() method as follows:为此,我们将使用splitlines()方法,如下所示:

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

The Final Product:最终产品:

# Open the file for reading.
with open('my_file.txt', 'r') as infile:

    data = infile.read()  # Read the contents of the file into memory.

# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()

Testing Our Code:测试我们的代码:

  • Contents of the text file:文本文件的内容:
     A fost odatã ca-n povesti,
     A fost ca niciodatã,
     Din rude mãri împãrãtesti,
     O prea frumoasã fatã.
  • Print statements for testing purposes:出于测试目的打印语句:
    print my_list  # Print the list.

    # Print each line in the list.
    for line in my_list:
        print line

    # Print the fourth element in this list.
    print my_list[3]
  • Output (different-looking because of unicode characters):输出(由于 unicode 字符而看起来不同):
     ['A fost odat\xc3\xa3 ca-n povesti,', 'A fost ca niciodat\xc3\xa3,',
     'Din rude m\xc3\xa3ri \xc3\xaemp\xc3\xa3r\xc3\xa3testi,', 'O prea
     frumoas\xc3\xa3 fat\xc3\xa3.']

     A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
     împãrãtesti, O prea frumoasã fatã.

     O prea frumoasã fatã.

Introduced in Python 3.4, pathlib has a really convenient method for reading in text from files, as follows:在 Python 3.4 中引入的pathlib有一个非常方便的从文件中读取文本的方法,如下所示:

from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()

(The splitlines call is what turns it from a string containing the whole contents of the file to a list of lines in the file). splitlines调用将它从包含文件全部内容的字符串转换为文件中的行列表)。

pathlib has a lot of handy conveniences in it. pathlib有很多方便的地方。 read_text is nice and concise, and you don't have to worry about opening and closing the file. read_text简洁明了,您不必担心打开和关闭文件。 If all you need to do with the file is read it all in in one go, it's a good choice.如果您需要对文件做的所有事情都是一口气读完,那么这是一个不错的选择。

Here's one more option by using list comprehensions on files;这是对文件使用列表推导的另一种选择;

lines = [line.rstrip() for line in open('file.txt')]

This should be more efficient way as the most of the work is done inside the Python interpreter.这应该是更有效的方式,因为大部分工作都是在 Python 解释器中完成的。

f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out

Now variable out is a list (array) of what you want.现在变量 out 是您想要的列表(数组)。 You could either do:你可以这样做:

for line in out:
    print (line)

Or:或者:

for line in f:
    print (line)

You'll get the same results.你会得到同样的结果。

Another option is numpy.genfromtxt , for example:另一个选项是numpy.genfromtxt ,例如:

import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="\n")

This will make data a NumPy array with as many rows as are in your file.这将使data成为一个 NumPy 数组,其中包含与文件中一样多的行。

Read and write text files with Python 2 and Python 3;使用 Python 2 和 Python 3 读写文本文件; it works with Unicode它适用于 Unicode

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Define data
lines = ['     A first string  ',
         'A Unicode sample: €',
         'German: äöüß']

# Write text file
with open('file.txt', 'w') as fp:
    fp.write('\n'.join(lines))

# Read text file
with open('file.txt', 'r') as fp:
    read_lines = fp.readlines()
    read_lines = [line.rstrip('\n') for line in read_lines]

print(lines == read_lines)

Things to notice:注意事项:

  • with is a so-called context manager . with是所谓的上下文管理器 It makes sure that the opened file is closed again.它确保打开的文件再次关闭。
  • All solutions here which simply make .strip() or .rstrip() will fail to reproduce the lines as they also strip the white space.这里所有简单地制作.strip().rstrip()的解决方案都将无法重现这些lines ,因为它们也会去除空白。

Common file endings常见文件结尾

.txt

More advanced file writing/reading更高级的文件写入/读取

For your application, the following might be important:对于您的应用程序,以下内容可能很重要:

  • Support by other programming languages其他编程语言的支持
  • Reading/writing performance读/写性能
  • Compactness (file size)紧凑性(文件大小)

See also: Comparison of data serialization formats另见: 数据序列化格式的比较

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python .如果您正在寻找一种制作配置文件的方法,您可能需要阅读我的短文Python 中的配置文件

If you'd like to read a file from the command line or from stdin, you can also use the fileinput module:如果您想从命令行或标准输入读取文件,也可以使用fileinput模块:

# reader.py
import fileinput

content = []
for line in fileinput.input():
    content.append(line.strip())

fileinput.close()

Pass files to it like so:像这样将文件传递给它:

$ python reader.py textfile.txt 

Read more here: http://docs.python.org/2/library/fileinput.html在这里阅读更多:http: //docs.python.org/2/library/fileinput.html

The simplest way to do it最简单的方法

A simple way is to:一个简单的方法是:

  1. Read the whole file as a string将整个文件作为字符串读取
  2. Split the string line by line逐行拆分字符串

In one line, that would give:在一行中,这将给出:

lines = open('C:/path/file.txt').read().splitlines()

However, this is quite inefficient way as this will store 2 versions of the content in memory (probably not a big issue for small files, but still).但是,这是一种非常低效的方式,因为这会将 2 个版本的内容存储在内存中(对于小文件来说可能不是大问题,但仍然如此)。 [Thanks Mark Amery]. [感谢马克艾默里]。

There are 2 easier ways:有2个更简单的方法:

  1. Using the file as an iterator使用文件作为迭代器
lines = list(open('C:/path/file.txt'))
# ... or if you want to have a list without EOL characters
lines = [l.rstrip() for l in open('C:/path/file.txt')]
  1. If you are using Python 3.4 or above, better use pathlib to create a path for your file that you could use for other operations in your program:如果您使用的是 Python 3.4 或更高版本,最好使用pathlib为您的文件创建一个路径,您可以将其用于程序中的其他操作:
from pathlib import Path
file_path = Path("C:/path/file.txt") 
lines = file_path.read_text().split_lines()
# ... or ... 
lines = [l.rstrip() for l in file_path.open()]

Just use the splitlines() functions.只需使用 splitlines() 函数。 Here is an example.这是一个例子。

inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3

In the output you will have the list of lines.在输出中,您将获得行列表。

If you want to are faced with a very large / huge file and want to read faster (imagine you are in a Topcoder/Hackerrank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.如果你想面对一个非常大/巨大的文件并且想要更快地读取(想象你正在参加 Topcoder/Hackerrank 编码比赛),你可能会一次将一大块行读入内存缓冲区,而不是只需在文件级别逐行迭代。

buffersize = 2**16
with open(path) as f: 
    while True:
        lines_buffer = f.readlines(buffersize)
        if not lines_buffer:
            break
        for line in lines_buffer:
            process(line)

The easiest ways to do that with some additional benefits are:具有一些额外好处的最简单方法是:

lines = list(open('filename'))

or或者

lines = tuple(open('filename'))

or或者

lines = set(open('filename'))

In the case with set , we must be remembered that we don't have the line order preserved and get rid of the duplicated lines.在使用set的情况下,我们必须记住我们没有保留行顺序并删除重复的行。

Below I added an important supplement from @MarkAmery :下面我添加了来自@MarkAmery的重要补充:

Since you're not calling .close on the file object nor using a with statement, in some Python implementations the file may not get closed after reading and your process will leak an open file handle .由于您没有在文件对象上调用.close ,也没有使用with语句,因此在某些Python实现中,文件可能不会在读取后关闭,并且您的进程将泄漏打开的文件句柄

In CPython (the normal Python implementation that most people use), this isn't a problem since the file object will get immediately garbage-collected and this will close the file, but it's nonetheless generally considered best practice to do something like :CPython (大多数人使用的普通Python实现)中,这不是问题,因为文件对象将立即被垃圾收集,这将关闭文件,但通常认为最好的做法是执行以下操作:

with open('filename') as f: lines = list(f) 

to ensure that the file gets closed regardless of what Python implementation you're using.以确保无论您使用什么Python实现,文件都会被关闭。

In case that there are also empty lines in the document I like to read in the content and pass it through filter to prevent empty string elements如果文档中也有空行,我想在内容中读取并通过filter将其传递以防止出现空字符串元素

with open(myFile, "r") as f:
    excludeFileContent = list(filter(None, f.read().splitlines()))

Use this:用这个:

import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values

data is a dataframe type, and uses values to get ndarray. data是一种数据框类型,并使用值来获取 ndarray。 You can also get a list by using array.tolist() .您还可以使用array.tolist()获取列表。

Outline and Summary大纲和总结

With a filename , handling the file from a Path(filename) object, or directly with open(filename) as f , do one of the following:使用filename ,从Path(filename)对象处理文件,或直接使用open(filename) as f ,执行以下操作之一:

  • list(fileinput.input(filename))
  • using with path.open() as f , call f.readlines()使用with path.open() as f ,调用f.readlines()
  • list(f)
  • path.read_text().splitlines()
  • path.read_text().splitlines(keepends=True)
  • iterate over fileinput.input or f and list.append each line one at a time遍历fileinput.inputflist.append每一行一次
  • pass f to a bound list.extend methodf传递给绑定的list.extend方法
  • use f in a list comprehension在列表理解中使用f

I explain the use-case for each below.我在下面解释每个用例。

In Python, how do I read a file line-by-line?在 Python 中,如何逐行读取文件?

This is an excellent question.这是一个很好的问题。 First, let's create some sample data:首先,让我们创建一些示例数据:

from pathlib import Path
Path('filename').write_text('foo\nbar\nbaz')

File objects are lazy iterators, so just iterate over it.文件对象是惰性迭代器,因此只需对其进行迭代。

filename = 'filename'
with open(filename) as f:
    for line in f:
        line # do something with the line

Alternatively, if you have multiple files, use fileinput.input , another lazy iterator.或者,如果您有多个文件,请使用fileinput.input ,另一个惰性迭代器。 With just one file:只有一个文件:

import fileinput

for line in fileinput.input(filename): 
    line # process the line

or for multiple files, pass it a list of filenames:或者对于多个文件,将文件名列表传递给它:

for line in fileinput.input([filename]*2): 
    line # process the line

Again, f and fileinput.input above both are/return lazy iterators.同样,上面的ffileinput.input都是/return 惰性迭代器。 You can only use an iterator one time, so to provide functional code while avoiding verbosity I'll use the slightly more terse fileinput.input(filename) where apropos from here.您只能使用一次迭代器,因此为了提供功能代码同时避免冗长,我将使用稍微简洁的fileinput.input(filename) where apropos from here。

In Python, how do I read a file line-by-line into a list?在 Python 中,如何将文件逐行读取到列表中?

Ah but you want it in a list for some reason?啊,但你出于某种原因想要它在列表中? I'd avoid that if possible.如果可能的话,我会避免这种情况。 But if you insist... just pass the result of fileinput.input(filename) to list :但是,如果您坚持...只需将fileinput.input(filename)的结果传递给list

list(fileinput.input(filename))

Another direct answer is to call f.readlines , which returns the contents of the file (up to an optional hint number of characters, so you could break this up into multiple lists that way).另一个直接的答案是调用f.readlines ,它返回文件的内容(最多可选hint字符数,因此您可以通过这种方式将其分解为多个列表)。

You can get to this file object two ways.您可以通过两种方式访问​​此文件对象。 One way is to pass the filename to the open builtin:一种方法是将文件名传递给open内置:

filename = 'filename'

with open(filename) as f:
    f.readlines()

or using the new Path object from the pathlib module (which I have become quite fond of, and will use from here on):或者使用来自pathlib模块的新 Path 对象(我已经非常喜欢,并且将从这里开始使用):

from pathlib import Path

path = Path(filename)

with path.open() as f:
    f.readlines()

list will also consume the file iterator and return a list - a quite direct method as well: list还将使用文件迭代器并返回一个列表 - 也是一个非常直接的方法:

with path.open() as f:
    list(f)

If you don't mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path object and the splitlines() string method.如果您不介意在拆分之前将整个文本作为单个字符串读取到内存中,则可以使用Path对象和splitlines()字符串方法将其作为单行来执行。 By default, splitlines removes the newlines:默认情况下, splitlines删除换行符:

path.read_text().splitlines()

If you want to keep the newlines, pass keepends=True :如果要保留换行符,请传递keepends=True

path.read_text().splitlines(keepends=True)

I want to read the file line by line and append each line to the end of the list.我想逐行读取文件并将每一行附加到列表的末尾。

Now this is a bit silly to ask for, given that we've demonstrated the end result easily with several methods.现在要求这个有点傻,因为我们已经用几种方法轻松地展示了最终结果。 But you might need to filter or operate on the lines as you make your list, so let's humor this request.但是您可能需要在制作列表时对行进行过滤或操作,所以让我们来满足这个要求。

Using list.append would allow you to filter or operate on each line before you append it:使用list.append将允许您在附加之前对每一行进行过滤或操作:

line_list = []
for line in fileinput.input(filename):
    line_list.append(line)

line_list

Using list.extend would be a bit more direct, and perhaps useful if you have a preexisting list:使用list.extend会更直接一些,如果您有一个预先存在的列表,可能会很有用:

line_list = []
line_list.extend(fileinput.input(filename))
line_list

Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:或者更惯用的说法,我们可以改为使用列表推导,并在需要时在其中映射和过滤:

[line for line in fileinput.input(filename)]

Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:或者更直接的,要闭环,直接传给list就可以新建一个list,不用对行进行操作:

list(fileinput.input(filename))

Conclusion结论

You've seen many ways to get lines from a file into a list, but I'd recommend you avoid materializing large quantities of data into a list and instead use Python's lazy iteration to process the data if possible.您已经看到了许多将文件中的行放入列表的方法,但我建议您避免将大量数据具体化到列表中,而是尽可能使用 Python 的惰性迭代来处理数据。

That is, prefer fileinput.input or with path.open() as f .也就是说,更喜欢fileinput.inputwith path.open() as f

You could also use the loadtxt command in NumPy.你也可以在 NumPy 中使用 loadtxt 命令。 This checks for fewer conditions than genfromtxt, so it may be faster.这检查的条件比 genfromtxt 少,因此它可能更快。

import numpy
data = numpy.loadtxt(filename, delimiter="\n")

I like to use the following.我喜欢使用以下内容。 Reading the lines immediately.立即阅读台词。

contents = []
for line in open(filepath, 'r').readlines():
    contents.append(line.strip())

Or using list comprehension:或使用列表理解:

contents = [line.strip() for line in open(filepath, 'r').readlines()]

I would try one of the below mentioned methods.我会尝试以下提到的方法之一。 The example file that I use has the name dummy.txt .我使用的示例文件名为dummy.txt You can find the file here .您可以在此处找到该文件。 I presume, that the file is in the same directory as the code (you can change fpath to include the proper file name and folder path.)我认为该文件与代码位于同一目录中(您可以更改fpath以包含正确的文件名和文件夹路径。)

In both the below mentioned examples, the list that you want is given by lst .在下面提到的两个示例中,您想要的列表由lst给出。

1.> First method : 1.> 第一种方法

fpath = 'dummy.txt'
with open(fpath, "r") as f: lst = [line.rstrip('\n \t') for line in f]

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

2.> In the second method , one can use csv.reader module from Python Standard Library : 2.>第二种方法中,可以使用Python 标准库中的csv.reader模块

import csv
fpath = 'dummy.txt'
with open(fpath) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='   ')
    lst = [row[0] for row in csv_reader] 

print lst
>>>['THIS IS LINE1.', 'THIS IS LINE2.', 'THIS IS LINE3.', 'THIS IS LINE4.']

You can use either of the two methods.您可以使用这两种方法中的任何一种。 Time taken for the creation of lst is almost equal in the two methods.在这两种方法中,创建lst所花费的时间几乎相等。

Here is a Python(3) helper library class that I use to simplify file I/O:这是我用来简化文件 I/O 的 Python(3) 辅助类:

import os

# handle files using a callback method, prevents repetition
def _FileIO__file_handler(file_path, mode, callback = lambda f: None):
  f = open(file_path, mode)
  try:
    return callback(f)
  except Exception as e:
    raise IOError("Failed to %s file" % ["write to", "read from"][mode.lower() in "r rb r+".split(" ")])
  finally:
    f.close()


class FileIO:
  # return the contents of a file
  def read(file_path, mode = "r"):
    return __file_handler(file_path, mode, lambda rf: rf.read())

  # get the lines of a file
  def lines(file_path, mode = "r", filter_fn = lambda line: len(line) > 0):
    return [line for line in FileIO.read(file_path, mode).strip().split("\n") if filter_fn(line)]

  # create or update a file (NOTE: can also be used to replace a file's original content)
  def write(file_path, new_content, mode = "w"):
    return __file_handler(file_path, mode, lambda wf: wf.write(new_content))

  # delete a file (if it exists)
  def delete(file_path):
    return os.remove() if os.path.isfile(file_path) else None

You would then use the FileIO.lines function, like this:然后,您将使用FileIO.lines函数,如下所示:

file_ext_lines = FileIO.lines("./path/to/file.ext"):
for i, line in enumerate(file_ext_lines):
  print("Line {}: {}".format(i + 1, line))

Remember that the mode ( "r" by default) and filter_fn (checks for empty lines by default) parameters are optional.请记住, mode (默认为"r" )和filter_fn (默认​​检查空行)参数是可选的。

You could even remove the read , write and delete methods and just leave the FileIO.lines , or even turn it into a separate method called read_lines .您甚至可以删除readwritedelete方法而只保留FileIO.lines ,或者甚至将其变成一个名为read_lines的单独方法。

Command line version命令行版本

#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("\n") 
print(arr)

Run with:运行:

python3 somefile.py input_file_name.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM