Python genfromtext多种数据类型

Question

I would like to read in a csv file using genfromtxt. 我想使用genfromtxt读取csv文件。 I have six columns that are float, and one column that is a string. 我有六列浮点数，一列是一个字符串。

How do I set the datatype so that the float columns will be read in as floats and the string column will be read in as strings? 如何设置数据类型以便将浮点列作为浮点读入，字符串列将作为字符串读入？ I tried dtype='void' but that is not working. 我试过dtype ='void'但是没有用。

Suggestions? 建议？

Thanks 谢谢

.csv file .csv文件

999.9, abc, 34, 78, 12.3
1.3, ghf, 12, 8.4, 23.7
101.7, evf, 89, 2.4, 11.3



x = sys.argv[1]
f = open(x, 'r')
y = np.genfromtxt(f, delimiter = ',', dtype=[('f0', '<f8'), ('f1', 'S4'), (\
'f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8')])

ionenergy = y[:,0]
units = y[:,1]

Error: 错误：

ionenergy = y[:,0]
IndexError: invalid index

I don't get this error when I specify a single data type.. 当我指定单个数据类型时，我没有收到此错误。

Answer 1

dtype=None tells genfromtxt to guess the appropriate dtype. genfromtxt dtype=None告诉genfromtxt猜测适当的genfromtxt 。

From the docs : 来自文档：

dtype: dtype, optional dtype：dtype，可选

Data type of the resulting array. 结果数组的数据类型。 If None, the dtypes will be determined by the contents of each column, individually. 如果为None，则dtypes将由每列的内容单独确定。

(my emphasis.) （我的重点。）

Since your data is comma-separated, be sure to include delimiter=',' or else np.genfromtxt will interpret each column (execpt the last) as including a string character (the comma) and therefore mistakenly assign a string dtype to each of those columns. 由于您的数据以逗号分隔，请务必包含delimiter=','或者np.genfromtxt将每列（ np.genfromtxt最后一个）解释为包含字符串字符（逗号），因此错误地为每个列分配一个字符串dtype那些专栏。

For example: 例如：

import numpy as np

arr = np.genfromtxt('data', dtype=None, delimiter=',')

print(arr.dtype)
# [('f0', '<f8'), ('f1', 'S4'), ('f2', '<i4'), ('f3', '<f8'), ('f4', '<f8')]

This shows the names and dtypes of each column. 这显示了每列的名称和dtypes。 For example, ('f3', <f8) means the fourth column has name 'f3' and is of dtype '<i4. 例如， ('f3', <f8)表示第四列的名称为'f3'且dtype为'<i4。 The i means it is an integer dtype. i表示它是整数dtype。 If you need the third column to be a float dtype then there are a few options. 如果你需要第三列是float dtype，那么有几个选项。

You could manually edit the data by adding a decimal point in the third column to force genfromtxt to interpret values in that column to be of a float dtype. 您可以通过在第三列中添加小数点来手动编辑数据，以强制genfromtxt将该列中的值解释为float dtype。

You could supply the dtype explicitly in the call to genfromtxt 您可以在对genfromtxt的调用中显式提供dtype

 arr = np.genfromtxt( 'data', delimiter=',', dtype=[('f0', '<f8'), ('f1', 'S4'), ('f2', '<f4'), ('f3', '<f8'), ('f4', '<f8')])

print(arr)
# [(999.9, ' abc', 34, 78.0, 12.3) (1.3, ' ghf', 12, 8.4, 23.7)
#  (101.7, ' evf', 89, 2.4, 11.3)]

print(arr['f2'])
# [34 12 89]

The error message IndexError: invalid index is being generated by the line 错误消息IndexError: invalid index行生成IndexError: invalid index

ionenergy = y[:,0]

When you have mixed dtypes, np.genfromtxt returns a structured array . 当你有混合dtypes时， np.genfromtxt返回一个结构化数组。 You need to read up on structured arrays because the syntax for accessing columns differs from the syntax used for plain arrays of homogenous dtype. 您需要阅读结构化数组，因为访问列的语法不同于用于同类dtype的普通数组的语法。

Instead of y[:, 0] , to access the first column of the structured array y , use 而不是y[:, 0] ，要访问结构化数组y的第一列，请使用

y['f0']

Or, better yet, supply the names parameter in np.genfromtxt , so you can use a more relevant column name, like y['ionenergy'] : 或者，更好的是，在np.genfromtxt提供names参数，这样您就可以使用更相关的列名，例如y['ionenergy'] ：

import numpy as np
arr = np.genfromtxt(
    'data', delimiter=',', dtype=None,
    names=['ionenergy', 'foo', 'bar', 'baz', 'quux', 'corge'])

print(arr['ionenergy'])
# [ 999.9    1.3  101.7]

Answer 2

Please try this: 请试试这个：

import numpy

ionenergy = y.iloc[:,0]
units = y.iloc[:,1]

Python genfromtext多种数据类型

问题描述

2 个解决方案

解决方案1
4 2013-10-27 20:15:40

解决方案2
0 2017-03-16 03:01:52

Python genfromtext多种数据类型

问题描述

2 个解决方案

解决方案1 4 2013-10-27 20:15:40

解决方案2 0 2017-03-16 03:01:52

解决方案1
4 2013-10-27 20:15:40

解决方案2
0 2017-03-16 03:01:52