简体   繁体   English

用另一个 numpy 数组的值替换一个 numpy 数组的值

[英]Replace values of a numpy array by values from another numpy array

i have a 1000 * 1000 numpy array with 1 million values which was created as follows :我有一个包含 100 万个值的 1000 * 1000 numpy 数组,其创建方式如下:

>>import numpy as np
>>data = np.loadtxt('space_data.txt')
>> print (data)
>>[[ 13.  15.  15. ...,  15.  15.  16.]
   [ 14.  13.  14. ...,  13.  15.  16.]
   [ 16.  13.  13. ...,  13.  15.  17.]
   ..., 
   [ 14.   15.  14. ...,  14.  14.  13.]
   [ 15.   15.  16. ...,  16.  15.  14.]
   [ 14.   13.  16. ...,  16.  16.  16.]]

I have another numpy array which which has 2 columns as follows:我有另一个 numpy 数组,它有 2 列,如下所示:

>> print(key)
>>[[ 10.,   S],
   [ 11.,   S],
   [ 12.,   S],
   [ 13.,   M],
   [ 14.,   L],
   [ 15.,   S],
   [ 16.,   S],
   ...,
   [ 92.,   XL],
   [ 93.,   M],
   [ 94.,   XL],
   [ 95.,   S]]

What i would basically want is to replace each element of of the data array with corresponding element in the second column of the key array like this..我基本上想要的是用键数组的第二列中的相应元素替换数据数组的每个元素,如下所示..

>> print(data)
>>[[ M  S  S ...,  S  S  S]
   [ L   M  L ...,  M  S  S]
   [ S   M  M ...,  M  S  XL]
   ..., 
   [ L   S  L ...,  L  L  M]
   [ S   S  S ...,  S  S  L]
   [ L   M  S ...,  S  S  S]]

In Python dicts are a natural choice for mapping from keys to values.在 Python 中,字典是从键到值的映射的自然选择。 NumPy has no direct equivalent of a dict. NumPy 没有直接等效的字典。 But it does have arrays which can do fast integer indexing.但它确实有可以进行快速整数索引的数组。 For example,例如,

In [153]: keyarray = np.array(['S','M','L','XL'])

In [158]: data = np.array([[0,2,1], [1,3,2]])

In [159]: keyarray[data]
Out[159]: 
array([['S', 'L', 'M'],
       ['M', 'XL', 'L']], 
      dtype='|S2')

So if we could massage your key array into one that looked like this:因此,如果我们可以将您的key数组按摩成如下所示的数组:

In [161]: keyarray
Out[161]: 
array(['', '', '', '', '', '', '', '', '', '', 'S', 'S', 'S', 'M', 'L',
       'S', 'S', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', 'XL', 'M', 'XL', 'S'], 
      dtype='|S32')

So that 10 maps to 'S' in the sense that keyarray[10] equals S , and so forth:因此,从keyarray[10]等于S的意义上来说, 10 映射到 'S' ,依此类推:

In [162]: keyarray[10]
Out[162]: 'S'

then we could produce the desired result with keyarray[data] .然后我们可以使用keyarray[data]产生所需的结果。


import numpy as np

data = np.array( [[ 13.,   15.,  15.,  15.,  15.,  16.],
                  [ 14.,   13.,  14.,  13.,  15.,  16.],
                  [ 16.,   13.,  13.,  13.,  15.,  17.],
                  [ 14.,   15.,  14.,  14.,  14.,  13.],
                  [ 15.,   15 ,  16.,  16.,  15.,  14.],
                  [ 14.,   13.,  16.,  16.,  16.,  16.]])

key = np.array([[ 10., 'S'],
                [ 11., 'S'],
                [ 12., 'S'],
                [ 13., 'M'],
                [ 14., 'L'],
                [ 15., 'S'],
                [ 16., 'S'],
                [ 17., 'XL'],
                [ 92., 'XL'],
                [ 93., 'M'],
                [ 94., 'XL'],
                [ 95., 'S']])

idx = np.array(key[:,0], dtype=float).astype(int)
n = idx.max()+1
keyarray = np.empty(n, dtype=key[:,1].dtype)
keyarray[:] = ''
keyarray[idx] = key[:,1]

data = data.astype('int')
print(keyarray[data])

yields产量

[['M' 'S' 'S' 'S' 'S' 'S']
 ['L' 'M' 'L' 'M' 'S' 'S']
 ['S' 'M' 'M' 'M' 'S' 'XL']
 ['L' 'S' 'L' 'L' 'L' 'M']
 ['S' 'S' 'S' 'S' 'S' 'L']
 ['L' 'M' 'S' 'S' 'S' 'S']]

Note that data = data.astype('int') is assuming that the floats in data can be uniquely mapped to int s.请注意, data = data.astype('int')假设data中的浮点数可以唯一映射到int s。 That appears to be the case with your data, but it is not true for arbitrary floats.您的数据似乎就是这种情况,但对于任意浮点数而言则不然。 For example, astype('int') maps both 1.0 and 1.5 map to 1.例如, astype('int')将 1.0 和 1.5 映射到 1。

In [167]: np.array([1.0, 1.5]).astype('int')
Out[167]: array([1, 1])

An un-vectorized linear approach will be to use a dictionary here:一种非矢量化的线性方法是在这里使用字典:

dct = dict(keys)
# new array is required if dtype is different or it it cannot be casted
new_array = np.empty(data.shape, dtype=str)
for index in np.arange(data.size):
    index = np.unravel_index(index, data.shape)
    new_array[index] = dct[data[index]] 
import numpy as np

data = np.array([[ 13.,  15.,  15.],
   [ 14.,  13.,  14. ],
   [ 16.,  13.,  13. ]])

key = [[ 10.,   'S'],
   [ 11.,   'S'],
   [ 12.,   'S'],
   [ 13.,   'M'],
   [ 14.,   'L'],
   [ 15.,   'S'],
   [ 16.,   'S']]

data2 = np.zeros(data.shape, dtype=str)

for k in key:
    data2[data == k[0]] = k[1]
# Create a dataframe out of your 'data' array and make a dictionary out of your 'key' array. 
import numpy as np
import pandas as pd

data = np.array([[ 13.,  15.,  15.],
               [ 14.,  13.,  14. ],
               [ 16.,  13.,  13. ]])
data_df = pd.DataFrame(data)
key  = dict({10 : 'S',11 : 'S', 12 : 'S', 13 : 'M',14:'L',15:'S',16:'S'})
# Replace the values in newly created dataframe and convert that into array.
data_df.replace(key,inplace = True)

data = np.array(data_df)
print(data)

This will be the output:这将是输出:

[['M' 'S' 'S']
['L' 'M' 'L']
['S' 'M' 'M']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM