[英]Split a numpy column into two columns and keep them in the original array
I have a numpy array, which has 3 columns.我有一个 numpy 数组,它有 3 列。 There are 100,000 rows, but here are the first two:
有 100,000 行,但这里是前两行:
burger flipper part time 12-5.00
spam flipper full time 98-10.00
The problem is, the job codes (12 and 98) have somehow gotten combined with the hourly wage (5.00 and 10.00).问题是,工作代码(12 和 98)以某种方式与小时工资(5.00 和 10.00)结合在一起。
Is there a simple way in numpy to split this column into two, and get rid of that unnecessary '-' character, as in:在 numpy 中是否有一种简单的方法可以将此列分成两部分,并去掉不必要的“-”字符,如下所示:
burger flipper part time 12 5.00
spam flipper full time 98 10.00
Thanks in advance.提前致谢。
One way of doing it using hstack
:使用
hstack
一种方法:
import numpy as np
a = np.array([['burger flipper', 'part time', '12-5.00'],
['spam flipper', 'full time', '98-10.00']])
a = np.hstack((a[:,:2], map(lambda x: x.split('-'), a[:,2])))
print a
Output:输出:
[['burger flipper' 'part time' '12' '5.00']
['spam flipper' 'full time' '98' '10.00']]
A bit of explanation:一点解释:
The function numpy.hstack allows you to horizontally stack multiple numpy arrays.函数numpy.hstack允许您水平堆叠多个 numpy 数组。 For example,
例如,
np.hstack((a[:,[0,1]], a[:,[2]]))
produces the original array a
with three columns.生成具有三列的原始数组
a
。 Note the use of brackets in a[:,[2]]
, [a:,2]
will not work as it produces a single dimensional array ( len(a[:,2].shape)
equals 1).请注意,在
a[:,[2]]
、 [a:,2]
使用方括号将不起作用,因为它会生成一维数组( len(a[:,2].shape)
等于 1)。
The map
statement apply a function lambda x: x.split('-')
to the problematic column (ie the 3rd column) of the array. map
语句将函数lambda x: x.split('-')
应用于数组的有问题的列(即第 3 列)。 Each call to the lambda function returns a list containing the separated job codes and wage, such as ['12', '5.00']
.对 lambda 函数的每次调用都会返回一个包含分隔的工作代码和工资的列表,例如
['12', '5.00']
。 Thus, the map
statement produces a list of list which looks like [['12', '5.00'], ['98', '10.00']]
.因此,
map
语句会生成一个类似于[['12', '5.00'], ['98', '10.00']]
的列表列表。 This can be converted to a numpy array with 2 columns when being fed to hstack
.当被送入
hstack
时,这可以转换为一个有 2 列的 numpy 数组。
The code hstack
first two columns of the original array with the list of list obtained via map
, resulting in an array similar to what you want in the end.代码
hstack
原始数组的前两列与通过map
获得的列表列表,最终得到一个类似于您想要的数组。
map(lambda x: x.split('-'), a[:,2])
is now giving a list instead of two columns leading to the following error: map(lambda x: x.split('-'), a[:,2])
现在给出一个列表而不是导致以下错误的两列:
ValueError: all the input arrays must have same number of dimensions
Needed to change the previous code to:需要将之前的代码改为:
import numpy as np
a = np.array([['burger flipper', 'part time', '12-5.00'],
['spam flipper', 'full time', '98-10.00']])
a_newcolumns = np.hstack((map(lambda x: x.split('-'), a[:, 2]))).reshape(a.shape[0], 2)
# need to reshape the list into a two column numpy array
a = np.hstack((a[:, :2], a_newcolumns))
print a
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.