将一个 numpy 列拆分为两列并将它们保留在原始数组中

Question

I have a numpy array, which has 3 columns.我有一个 numpy 数组，它有 3 列。 There are 100,000 rows, but here are the first two:有 100,000 行，但这里是前两行：

 burger flipper  part time  12-5.00
 spam flipper    full time  98-10.00

The problem is, the job codes (12 and 98) have somehow gotten combined with the hourly wage (5.00 and 10.00).问题是，工作代码（12 和 98）以某种方式与小时工资（5.00 和 10.00）结合在一起。

Is there a simple way in numpy to split this column into two, and get rid of that unnecessary '-' character, as in:在 numpy 中是否有一种简单的方法可以将此列分成两部分，并去掉不必要的“-”字符，如下所示：

 burger flipper  part time  12  5.00
 spam flipper    full time  98  10.00

Thanks in advance.提前致谢。

Answer 1

One way of doing it using hstack :使用hstack一种方法：

import numpy as np
a = np.array([['burger flipper',  'part time',  '12-5.00'],
             ['spam flipper',    'full time',  '98-10.00']])
a = np.hstack((a[:,:2], map(lambda x: x.split('-'), a[:,2])))
print a

Output:输出：

[['burger flipper' 'part time' '12' '5.00']
 ['spam flipper' 'full time' '98' '10.00']]

A bit of explanation:一点解释：

The function numpy.hstack allows you to horizontally stack multiple numpy arrays.函数numpy.hstack允许您水平堆叠多个 numpy 数组。 For example,例如，
```
 np.hstack((a[:,[0,1]], a[:,[2]]))
```
produces the original array a with three columns.生成具有三列的原始数组a 。 Note the use of brackets in a[:,[2]] , [a:,2] will not work as it produces a single dimensional array ( len(a[:,2].shape) equals 1).请注意，在a[:,[2]] 、 [a:,2]使用方括号将不起作用，因为它会生成一维数组（ len(a[:,2].shape)等于 1）。
The map statement apply a function lambda x: x.split('-') to the problematic column (ie the 3rd column) of the array. map语句将函数lambda x: x.split('-')应用于数组的有问题的列（即第 3 列）。 Each call to the lambda function returns a list containing the separated job codes and wage, such as ['12', '5.00'] .对 lambda 函数的每次调用都会返回一个包含分隔的工作代码和工资的列表，例如['12', '5.00'] 。 Thus, the map statement produces a list of list which looks like [['12', '5.00'], ['98', '10.00']] .因此， map语句会生成一个类似于[['12', '5.00'], ['98', '10.00']]的列表列表。 This can be converted to a numpy array with 2 columns when being fed to hstack .当被送入hstack时，这可以转换为一个有 2 列的 numpy 数组。

The code hstack first two columns of the original array with the list of list obtained via map , resulting in an array similar to what you want in the end.代码hstack原始数组的前两列与通过map获得的列表列表，最终得到一个类似于您想要的数组。

Answer 2

map(lambda x: x.split('-'), a[:,2]) is now giving a list instead of two columns leading to the following error: map(lambda x: x.split('-'), a[:,2])现在给出一个列表而不是导致以下错误的两列：

ValueError: all the input arrays must have same number of dimensions

Needed to change the previous code to:需要将之前的代码改为：

import numpy as np
a = np.array([['burger flipper',  'part time',  '12-5.00'],
             ['spam flipper',    'full time',  '98-10.00']])
a_newcolumns = np.hstack((map(lambda x: x.split('-'), a[:, 2]))).reshape(a.shape[0], 2)
# need to reshape the list into a two column numpy array
a = np.hstack((a[:, :2], a_newcolumns))
print a

将一个 numpy 列拆分为两列并将它们保留在原始数组中

问题描述

2 个解决方案

解决方案1
2 2014-04-08 02:21:41

解决方案2
1 2019-08-20 09:18:32

将一个 numpy 列拆分为两列并将它们保留在原始数组中

问题描述

2 个解决方案

解决方案1 2 2014-04-08 02:21:41

解决方案2 1 2019-08-20 09:18:32

解决方案1
2 2014-04-08 02:21:41

解决方案2
1 2019-08-20 09:18:32