简体   繁体   中英

Replacing values in 2d numpy array based on 1d numpy array or list

Consider the following 2d numPy array:

import numpy as np

daily = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'group'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'group'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'group'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'group'],  
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'group'],
], dtype = object)

daily

And here is the other 1d numPy array (this could be list if needed):

campaigns = np.array([111, 333], dtype = object)
campaigns

What is the fastest way to replace the last column values from 'group' into 'new' or 'old' depending on whether the values from the campaigns exist or not? The way I was able to do it with python for loop + if statements is very slow for the final goal. The final go is to check several billion combinations of new/old so we need something very quick.

%%time
for x in daily:
    if x[4] in campaigns:
        x[7] = 'new'
    else:
        x[7] = 'old'
daily

And here is the expected result:

result = np.array([
['2022-01-01', 'AccountName1', 123456789, 'campaignname1', 111, 100, 1.1, 'new'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname2', 222, 200, 2.2, 'old'],
['2022-01-01', 'AccountName1', 123456789, 'campaignname3', 333, 300, 3.3, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname1', 111, 400, 4.4, 'new'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname2', 222, 500, 5.5, 'old'],
['2022-01-02', 'AccountName1', 123456789, 'campaignname3', 333, 600, 6.6, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname1', 111, 700, 7.7, 'new'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname2', 222, 800, 8.8, 'old'],
['2022-01-03', 'AccountName1', 123456789, 'campaignname3', 333, 900, 9.9, 'new']
], dtype=object)

result

The whole 4 column:

In [58]: daily[:,4]
Out[58]: array([111, 222, 333, 111, 222, 333, 111, 222, 333], dtype=object)

We can match it with campaigns with:

In [60]: np.in1d(daily[:,4],campaigns)
Out[60]: array([ True, False,  True,  True, False,  True,  True, False,  True])

In [62]: mask = np.in1d(daily[:,4],campaigns)

In [63]: daily[mask,7]
Out[63]: array(['group', 'group', 'group', 'group', 'group', 'group'], dtype=object)

where lets us convert that to an array of strings:

In [67]: np.where(mask, 'new','old')
Out[67]: 
array(['new', 'old', 'new', 'new', 'old', 'new', 'new', 'old', 'new'],
      dtype='<U3')

Which we can assign to the 7 column:

In [68]: daily[:,7] = _

I see lots of pandas questions about using np.where in the same sort of way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM