简体   繁体   中英

Split a numpy column into two columns and keep them in the original array

I have a numpy array, which has 3 columns. There are 100,000 rows, but here are the first two:

 burger flipper  part time  12-5.00
 spam flipper    full time  98-10.00

The problem is, the job codes (12 and 98) have somehow gotten combined with the hourly wage (5.00 and 10.00).

Is there a simple way in numpy to split this column into two, and get rid of that unnecessary '-' character, as in:

 burger flipper  part time  12  5.00
 spam flipper    full time  98  10.00

Thanks in advance.

One way of doing it using hstack :

import numpy as np
a = np.array([['burger flipper',  'part time',  '12-5.00'],
             ['spam flipper',    'full time',  '98-10.00']])
a = np.hstack((a[:,:2], map(lambda x: x.split('-'), a[:,2])))
print a

Output:

[['burger flipper' 'part time' '12' '5.00']
 ['spam flipper' 'full time' '98' '10.00']]

A bit of explanation:

  1. The function numpy.hstack allows you to horizontally stack multiple numpy arrays. For example,

     np.hstack((a[:,[0,1]], a[:,[2]]))

    produces the original array a with three columns. Note the use of brackets in a[:,[2]] , [a:,2] will not work as it produces a single dimensional array ( len(a[:,2].shape) equals 1).

  2. The map statement apply a function lambda x: x.split('-') to the problematic column (ie the 3rd column) of the array. Each call to the lambda function returns a list containing the separated job codes and wage, such as ['12', '5.00'] . Thus, the map statement produces a list of list which looks like [['12', '5.00'], ['98', '10.00']] . This can be converted to a numpy array with 2 columns when being fed to hstack .

The code hstack first two columns of the original array with the list of list obtained via map , resulting in an array similar to what you want in the end.

map(lambda x: x.split('-'), a[:,2]) is now giving a list instead of two columns leading to the following error:

ValueError: all the input arrays must have same number of dimensions

Needed to change the previous code to:

import numpy as np
a = np.array([['burger flipper',  'part time',  '12-5.00'],
             ['spam flipper',    'full time',  '98-10.00']])
a_newcolumns = np.hstack((map(lambda x: x.split('-'), a[:, 2]))).reshape(a.shape[0], 2)
# need to reshape the list into a two column numpy array
a = np.hstack((a[:, :2], a_newcolumns))
print a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM