简体   繁体   中英

Pandas - issue with numpy.where

I have the following line of code:

# slice off the last 4 chars in name wherever its code contains the substring '-CUT'
df['name'] = np.where(df['code'].str.contains('-CUT'),
                      df['name'].str[:-4], df['name'])

However, this doesn't seem to be working correctly. It's slicing off the last 4 characters for the correct columns, but it's also doing it for rows where the code is None/empty (almost all instances).

Is there anything obviously wrong with how I'm using np.where?

np.where

You can specify regex=False and na=False as parameters to pd.Series.str.contains so that only rows where your condition is met are updated:

df['name'] = np.where(df['code'].str.contains('-CUT', regex=False, na=False),
                      df['name'].str[:-4], df['name'])

regex=False isn't strictly necessary for this criterion, but it should improve performance. na=False ensures any type which cannot be processed via str methods returns False .

pd.DataFrame.loc

Alternatively, you can use pd.DataFrame.loc . This seems more natural than specifying an "unchanged" series as a final argument to np.where :

mask = df['code'].str.contains('-CUT', regex=False, na=False)
df.loc[mask, 'name'] = df['name'].str[:-4]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM