简体   繁体   English

如何使用 Python 将文本文件读入列表或数组

[英]How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python.我正在尝试将文本文件的行读入 python 中的列表或数组。 I just need to be able to individually access any item in the list or array after it is created.我只需要能够在创建后单独访问列表或数组中的任何项目。

The text file is formatted as follows:文本文件的格式如下:

0,0,200,0,53,1,0,255,...,0.

Where the ... is above, there actual text file has hundreds or thousands more items. ...在上面的地方,实际的文本文件有数百或数千个项目。

I'm using the following code to try to read the file into a list:我正在使用以下代码尝试将文件读入列表:

text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()

The output I get is:我得到的输出是:

['0,0,200,0,53,1,0,255,...,0.']
1

Apparently it is reading the entire file into a list of just one item, rather than a list of individual items.显然,它正在将整个文件读入只有一个项目的列表,而不是单个项目的列表。 What am I doing wrong?我究竟做错了什么?

You will have to split your string into a list of values using split()您必须使用split()将字符串拆分为值列表

So,所以,

lines = text_file.read().split(',')

EDIT: I didn't realise there would be so much traction to this.编辑:我没有意识到这会有这么多的吸引力。 Here's a more idiomatic approach.这是一个更惯用的方法。

import csv
with open('filename.csv', 'r') as fd:
    reader = csv.reader(fd)
    for row in reader:
        # do something

python's file.readLines() method returns a list of the lines in the file: python的file.readLines()方法返回文件中的行列表:

f = open('file_name.ext', 'r')
x = f.readlines()
f.close()

Now you should be able to iterate through the array of lines x. 现在你应该能够遍历x行数组。

If you want to use the file and not have to remember to close it afterward, do this: 如果您想使用该文件而不必记得以后关闭它,请执行以下操作:

with open('file_name.ext', 'r') as f:
    x = f.readlines()

You can also use numpy loadtxt like您还可以使用 numpy loadtxt 之类的

from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)

So you want to create a list of lists... We need to start with an empty list所以你想创建一个列表列表......我们需要从一个空列表开始

list_of_lists = []

next, we read the file content, line by line接下来,我们逐行读取文件内容

with open('data') as f:
    for line in f:
        inner_list = [elt.strip() for elt in line.split(',')]
        # in alternative, if you need to use the file content as numbers
        # inner_list = [int(elt.strip()) for elt in line.split(',')]
        list_of_lists.append(inner_list)

A common use case is that of columnar data, but our units of storage are the rows of the file, that we have read one by one, so you may want to transpose your list of lists.一个常见的用例是列数据,但我们的存储单位是文件的行,我们已经逐行读取,因此您可能需要转置列表列表。 This can be done with the following idiom这可以通过以下习语来完成

by_cols = zip(*list_of_lists)

Another common use is to give a name to each column另一个常见用途是为每一列命名

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
    by_names[col_name] = by_cols[i]

so that you can operate on homogeneous data items以便您可以对同类数据项进行操作

 mean_apple_prices = [money/fruits for money, fruits in
                     zip(by_names['apples revenue'], by_names['apples_sold'])]

Most of what I've written can be speeded up using the csv module, from the standard library.我写的大部分内容都可以使用标准库中的csv模块来加速。 Another third party module is pandas , that lets you automate most aspects of a typical data analysis (but has a number of dependencies).另一个第三方模块是pandas ,它可以让您自动化典型数据分析的大多数方面(但有许多依赖项)。


Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.更新虽然在 Python 2 中zip(*list_of_lists)返回一个不同的(转置)列表列表,但在 Python 3 中情况发生了变化并且zip(*list_of_lists)返回一个不可下标的zip 对象

If you need indexed access you can use如果您需要索引访问,您可以使用

by_cols = list(zip(*list_of_lists))

that gives you a list of lists in both versions of Python.这为您提供了两个版本的 Python 中的列表列表。

On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...另一方面,如果您不需要索引访问并且您想要的只是构建一个按列名索引的字典,那么 zip 对象就可以了……

file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column

This question is asking how to read the comma-separated value contents from a file into an iterable list:这个问题是问如何将文件中的逗号分隔值内容读取到可迭代列表中:

0,0,200,0,53,1,0,255,...,0.

The easiest way to do this is with the csv module as follows:最简单的方法是使用csv模块,如下所示:

import csv
with open('filename.dat', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')

Now, you can easily iterate over spamreader like this:现在,您可以像这样轻松地遍历spamreader

for row in spamreader:
    print(', '.join(row))

See documentation for more examples.有关更多示例,请参阅文档

with open('D:\python\positive.txt', 'r') as myfile: data=myfile.read().replace('\n', '')

If your file contains numerical values then numpy's loadtxt method seems to be the best approach. 如果你的文件包含数值,那么numpy的loadtxt方法似乎是最好的方法。 You can read the array as follows: 您可以按如下方式读取数组:

import numpy as np

filename = '../data/NLPR_MCT/db3/cam1.dat'
x = np.loadtxt(filename, delimiter=',')
print (x)

You can index values as array in x and file.readlines() is inconvenient because it inserts '\\n' in every line and indexing may become erroneous. 你可以将值索引为x和file.readlines()中的数组是不方便的,因为它在每一行中插入'\\ n',索引可能会变得错误。

Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.我有点晚了,但您也可以将文本文件读入数据帧,然后将相应的列转换为列表。

lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist() 

example.例子。

lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()

Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column注意:相应数据框的列名将采用整数形式,我选择 0 因为我只提取了第一列

Better this way,这样更好,

 def txt_to_lst(file_path):

    try:
        stopword=open(file_path,"r")
        lines = stopword.read().split('\n')
        print(lines)

    except Exception as e:
        print(e)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM