[英]Make single column of numpy array another datatype
Given a numpy array my_arr
filled with strings, how do I set the datatype of one of the columns to be float? 给定一个充满字符串的numpy数组
my_arr
,如何将其中一列的数据类型设置为float? I need it as an numpy array in order to use it with my existing code afterwards. 我需要它作为一个numpy数组,以便以后使用我现有的代码。 See example of a failed attempt below:
请参阅下面的失败尝试示例:
import numpy as np
dat = [['User1', 'Male', '2.2'], ['User2', 'Female', '3.777'], ['User3', 'Unknown', '0.0']]
my_arr = np.array(dat)
print my_arr
# [['User1' 'Male' '2.2'], ['User2' 'Female' '3.777'], ['User3' 'Unknown' '0.0']]
my_arr[:,2] = my_arr[:,2].astype(np.float)
print my_arr
# [['User1' 'Male' '2.2'], ['User2' 'Female' '3.777'], ['User3' 'Unknown' '0.0']]
There might be smarter ways on doing this but the following gives you the correct output I think; 这可能有更明智的方法,但以下给出了我认为正确的输出; you can use structured arrays :
你可以使用结构化数组 :
import numpy as np
dat = [['User1', 'Male', '2.2'], ['User2', 'Female', '3.777'], ['User3', 'Unknown', '0.0']]
# create data types: two strings of length 10 and float
dt = np.dtype('a10, a10, float')
# convert the inner lists to tuples so that a structured array can be used
for ind, l in enumerate(dat):
dat[ind] = tuple(l)
# convert dat to an array
my_arr = np.array(dat, dt)
Output: 输出:
array([('User1', 'Male', 2.2), ('User2', 'Female', 3.777),
('User3', 'Unknown', 0.0)],
dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', '<f8')])
You can also give names to the columns by doing: 您还可以通过执行以下操作为列指定名称:
dt = {'names': ['user', 'gender', 'number'], 'formats':['a10', 'a10', 'float']}
my_arr = np.array(dat, dt) # dat is the list with tuples, see above
The output now is: 现在的输出是:
array([('User1', 'Male', 2.2), ('User2', 'Female', 3.777),
('User3', 'Unknown', 0.0)],
dtype=[('user', 'S10'), ('gender', 'S10'), ('number', '<f8')])
And you can then access a single column by doing eg 然后,您可以通过执行例如访问单个列
my_arr['number']
array([ 2.2 , 3.777, 0. ])
my_arr['user']
array(['User1', 'User2', 'User3'], dtype='|S10')
I would recommend to use a dataframe from Python pandas where you can easily deal with different data types and complex data structures. 我建议使用Python pandas中的数据框,您可以轻松处理不同的数据类型和复杂的数据结构。
For your example: 对于你的例子:
import pandas as pd
pd.DataFrame(dat, columns=['user', 'gender', 'some number'])
would then simply give you: 然后会简单地给你:
user gender some number
0 User1 Male 2.2
1 User2 Female 3.777
2 User3 Unknown 0.0
You could convert your 2d array into a structured array with a mixed dtype
: 您可以将二维数组转换为具有混合
dtype
的结构化数组:
In [137]: my_arr
Out[137]:
array([['User1', 'Male', '2.2'],
['User2', 'Female', '3.777'],
['User3', 'Unknown', '0.0']],
dtype='<U7')
In [138]: dt=np.dtype('U7,U7,f') # complex dtype
In [139]: np.array([tuple(row) for row in my_arr], dtype=dt)
Out[139]:
array([('User1', 'Male', 2.200000047683716),
('User2', 'Female', 3.7769999504089355), ('User3', 'Unknown', 0.0)],
dtype=[('f0', '<U7'), ('f1', '<U7'), ('f2', '<f4')])
In [140]: _.shape
Out[140]: (3,)
Now it is a 1d
array with 3 fields. 现在它是一个包含3个字段的
1d
数组。 Instead of accessing columns by number you access fields by name, arr['f0']
etc. 您不是按编号访问列,而是按名称访问字段,
arr['f0']
等。
I used [tuple(row) for row in my_arr]
because the input to structured arrays has to be a list of tuples. 我
[tuple(row) for row in my_arr]
使用了[tuple(row) for row in my_arr]
因为结构化数组的输入必须是元组列表。 I could have used your dat
list, [tuple(row) for row in dat]
. 我本可以使用你的
dat
列表, [tuple(row) for row in dat]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.