[英]Making arrays of columns (or rows) of a (space-delimited) textfile in Python
I have seen similar questions but the answers always give strings of rows.我见过类似的问题,但答案总是给出一串行。 I want to make arrays of the columns of a text file.
我想制作文本文件列的数组。 I have a textfile like this (There is a text FILE that looks like this, but has 106 rows and 19 columns):
我有一个这样的文本文件(有一个看起来像这样的文本文件,但有 106 行和 19 列):
O2 CO2 NOx Ash Other
20.9 1.6 0.04 0.0002 0.0
22.0 2.3 0.31 0.0005 0.0
19.86 2.1 0.05 0.0002 0.0
17.06 3.01 0.28 0.006 0.001
I expect to have arrays of columns (either a 2D array of all columns or a 1D array of each column), and the first row is only for names so then a list for the first row.我希望有列数组(所有列的二维数组或每列的一维数组),第一行仅用于名称,然后是第一行的列表。 Since I would like to plot them later.
因为我想稍后绘制它们。
The desired result would be for example for a column:例如,所需的结果是一列:
array([0.04,
0.31 ,
0.05,
0.28 ], dtype=float32)
and for the first row:对于第一行:
species= ['O2','CO2','NOx','Ash',' Other']
I'd recommend not to manually loop over values in large data sets (in this case a sort of tab separated relational model).我建议不要手动循环大型数据集中的值(在这种情况下是一种制表符分隔的关系模型)。 Just use the methods of a safe and well-known library like NumPy:
只需使用安全且知名的库(如 NumPy)的方法:
import numpy as np
data = np.transpose(np.loadtxt("/path/to/file.txt", skiprows=1, delimiter="\t"))
with the inner loadtxt
you read your file and with skiprows=1
parameter skip the first row (the column names) to avoid incompatible data types and further conversions.使用内部
loadtxt
读取文件并使用skiprows=1
参数跳过第一行(列名)以避免不兼容的数据类型和进一步的转换。 If you need this row in the same structure just use insert a new row at index 0. then you need to transpose the matrix for which there's a safe method in NumPy as well.如果您需要同一结构中的这一行,只需使用在索引 0 处插入一个新行。那么您需要转置矩阵,在 NumPy 中也有一个安全方法。 I just used the output of
loadtxt
(which is a list of lists for each row) as input of transpose
to give a one-liner.我只是使用
loadtxt
的输出(它是每行的列表列表)作为transpose
输入来给出一个单行。 But it's better to use them apart in order to avoid "train wrecks" and also be able to see what happens in between and eventually correct the unwanted results.但是最好将它们分开使用,以避免“火车失事”,并且还能够看到中间发生的事情并最终纠正不需要的结果。
PS: the delimiter
parameter must be adjusted to match the one in the original file. PS:
delimiter
参数必须调整为与原始文件中的参数相匹配。 Check the loadtxt documentation for more info.查看loadtxt文档以获取更多信息。 I considered it to be a TAB.
我认为它是一个选项卡。 @KostasCharitidis - thanks for your note
@KostasCharitidis - 感谢您的留言
st = open('file.txt', 'r').read()
dct = []
species = []
for row in st.split('\n')[0].split(' '):
species.append(row)
for no, row in enumerate(st.split('\n')[1:]):
dct.append([])
for elem in row.split(' '):
dct[no].append([float(elem)])
print(species)
print(dct)
RESULT结果
['O2', 'CO2', 'NOx', 'Ash', 'Other']
[[[20.9], [1.6], [0.04], [0.0002], [0.0]], [[22.0], [2.3], [0.31], [0.0005], [0.0]], [[19.86], [2.1], [0.05], [0.0002], [0.0]], [[17.06], [3.01], [0.28], [0.006], [0.001]]]
file.txt文件.txt
O2 CO2 NOx Ash Other
20.9 1.6 0.04 0.0002 0.0
22.0 2.3 0.31 0.0005 0.0
19.86 2.1 0.05 0.0002 0.0
17.06 3.01 0.28 0.006 0.001
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.