[英]Read file with mixed types (floats and strings)
I have a data file composed of N
columns and M
lines which I need to read storing each column into an array/list. 我有一个由
N
列和M
行组成的数据文件,我需要阅读将每一列存储到数组/列表中。 The file is usually filled with numbers (floats) and in those cases I can just do: 该文件通常填充数字(浮点数),在这种情况下,我可以这样做:
import numpy as np
f_data = np.loadtxt('file.dat', unpack=True)
and the result is columns stored in f_data
as sublists where its elements are floats
, as expected. 结果是按预期将存储在
f_data
列作为子列表,其中其元素为floats
。
Other times the file can have random strings scattered around (see an example of such file here ) In those cases I need to read it in the same way (ie: unpacked with each column stored in a list/array and all elements in it stored as float
type) with all the strings converted to a default float (for example 99.999
) 其他时候,文件中可能散布着随机的字符串(请参见此处的示例)在这种情况下,我需要以相同的方式读取文件(即:解压缩存储在列表/数组中的每一列,并存储其中的所有元素)作为
float
类型),所有字符串都转换为默认float(例如99.999
)
In the example of the data file above, the column 5
would look like this after reading it: 在上面的数据文件的示例中,第
5
列在读取后看起来像这样:
f_data[5]
[2.049, 0.946, 0.942, 0.889, 99.999, 0.879, 0.989, 1.142, 1.062, 0.551, 1.233, 0.503]
Notice that all elements are of type float
and the string that was found was converted to 99.999
and also stored as a float. 请注意,所有元素均为
float
类型,并且找到的字符串已转换为99.999
并也存储为float。
np.genfromtxt
is able to read a file with mixed types but the result is that all the floats are stored as strings, which is not what I need. np.genfromtxt
可以读取混合类型的文件,但结果是所有浮点都存储为字符串,这不是我所需要的。
How can I do this? 我怎样才能做到这一点?
np.genfromtxt
is the answer, but it's a little bit tricky to get it working just right. np.genfromtxt
是np.genfromtxt
,但要使其正常工作有点棘手。
Try: 尝试:
np.genfromtxt("file.txt", dtype=float, filling_values=99.99)
This forces the type to a float, in every case. 在每种情况下,这都将类型强制为浮点型。 When numpy finds a value that isn't a float, it declares this value invalid, and thus missing.
当numpy找到一个不是浮点数的值时,它将声明该值无效,因此丢失。 Filling values gives a default answer for what to do when the data are missing, in your case, 99.99.
填充值给出了丢失数据时的默认答案,在您的情况下为99.99。
And, to edit as requested, to store column-wise, add unpack=True, making the total answer, 并且,要按要求进行编辑,按列存储,请添加unpack = True,得出总答案,
np.genfromtxt("file.txt", dtype=float, filling_values=99.99, unpack=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.