[英]Python genfromtext multiple datatypes
I would like to read in a csv file using genfromtxt. 我想使用genfromtxt读取csv文件。 I have six columns that are float, and one column that is a string.
我有六列浮点数,一列是一个字符串。
How do I set the datatype so that the float columns will be read in as floats and the string column will be read in as strings? 如何设置数据类型以便将浮点列作为浮点读入,字符串列将作为字符串读入? I tried dtype='void' but that is not working.
我试过dtype ='void'但是没有用。
Suggestions? 建议?
Thanks 谢谢
.csv file .csv文件
999.9, abc, 34, 78, 12.3
1.3, ghf, 12, 8.4, 23.7
101.7, evf, 89, 2.4, 11.3
x = sys.argv[1]
f = open(x, 'r')
y = np.genfromtxt(f, delimiter = ',', dtype=[('f0', '<f8'), ('f1', 'S4'), (\
'f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8')])
ionenergy = y[:,0]
units = y[:,1]
Error: 错误:
ionenergy = y[:,0]
IndexError: invalid index
I don't get this error when I specify a single data type.. 当我指定单个数据类型时,我没有收到此错误。
dtype=None
tells genfromtxt
to guess the appropriate dtype. genfromtxt
dtype=None
告诉genfromtxt
猜测适当的genfromtxt
。
dtype: dtype, optional
dtype:dtype,可选
Data type of the resulting array.
结果数组的数据类型。 If None, the dtypes will be determined by the contents of each column, individually.
如果为None,则dtypes将由每列的内容单独确定。
(my emphasis.) (我的重点。)
Since your data is comma-separated, be sure to include delimiter=','
or else np.genfromtxt
will interpret each column (execpt the last) as including a string character (the comma) and therefore mistakenly assign a string dtype to each of those columns. 由于您的数据以逗号分隔,请务必包含
delimiter=','
或者np.genfromtxt
将每列( np.genfromtxt
最后一个)解释为包含字符串字符(逗号),因此错误地为每个列分配一个字符串dtype那些专栏。
For example: 例如:
import numpy as np
arr = np.genfromtxt('data', dtype=None, delimiter=',')
print(arr.dtype)
# [('f0', '<f8'), ('f1', 'S4'), ('f2', '<i4'), ('f3', '<f8'), ('f4', '<f8')]
This shows the names and dtypes of each column. 这显示了每列的名称和dtypes。 For example,
('f3', <f8)
means the fourth column has name 'f3'
and is of dtype '<i4. 例如,
('f3', <f8)
表示第四列的名称为'f3'
且dtype为'<i4。 The i
means it is an integer dtype. i
表示它是整数dtype。 If you need the third column to be a float dtype then there are a few options. 如果你需要第三列是float dtype,那么有几个选项。
You could supply the dtype explicitly in the call to genfromtxt 您可以在对genfromtxt的调用中显式提供dtype
arr = np.genfromtxt( 'data', delimiter=',', dtype=[('f0', '<f8'), ('f1', 'S4'), ('f2', '<f4'), ('f3', '<f8'), ('f4', '<f8')])
print(arr)
# [(999.9, ' abc', 34, 78.0, 12.3) (1.3, ' ghf', 12, 8.4, 23.7)
# (101.7, ' evf', 89, 2.4, 11.3)]
print(arr['f2'])
# [34 12 89]
The error message IndexError: invalid index
is being generated by the line 错误消息
IndexError: invalid index
行生成IndexError: invalid index
ionenergy = y[:,0]
When you have mixed dtypes, np.genfromtxt
returns a structured array . 当你有混合dtypes时,
np.genfromtxt
返回一个结构化数组 。 You need to read up on structured arrays because the syntax for accessing columns differs from the syntax used for plain arrays of homogenous dtype. 您需要阅读结构化数组,因为访问列的语法不同于用于同类dtype的普通数组的语法。
Instead of y[:, 0]
, to access the first column of the structured array y
, use 而不是
y[:, 0]
,要访问结构化数组y
的第一列,请使用
y['f0']
Or, better yet, supply the names
parameter in np.genfromtxt
, so you can use a more relevant column name, like y['ionenergy']
: 或者,更好的是,在
np.genfromtxt
提供names
参数,这样您就可以使用更相关的列名,例如y['ionenergy']
:
import numpy as np
arr = np.genfromtxt(
'data', delimiter=',', dtype=None,
names=['ionenergy', 'foo', 'bar', 'baz', 'quux', 'corge'])
print(arr['ionenergy'])
# [ 999.9 1.3 101.7]
Please try this: 请试试这个:
import numpy
ionenergy = y.iloc[:,0]
units = y.iloc[:,1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.