熊猫-numpy.where问题

Question

I have the following line of code: 我有以下代码行：

# slice off the last 4 chars in name wherever its code contains the substring '-CUT'
df['name'] = np.where(df['code'].str.contains('-CUT'),
                      df['name'].str[:-4], df['name'])

However, this doesn't seem to be working correctly. 但是，这似乎无法正常工作。 It's slicing off the last 4 characters for the correct columns, but it's also doing it for rows where the code is None/empty (almost all instances). 它将正确的列切为最后4个字符，但对于代码为None / empty（几乎所有实例）的行，也是如此。

Is there anything obviously wrong with how I'm using np.where? 我如何使用np.where明显有问题吗？

Answer 1

np.where np.where

You can specify regex=False and na=False as parameters to pd.Series.str.contains so that only rows where your condition is met are updated: 您可以将regex=False和na=False指定为pd.Series.str.contains参数，以便仅更新满足条件的行：

df['name'] = np.where(df['code'].str.contains('-CUT', regex=False, na=False),
                      df['name'].str[:-4], df['name'])

regex=False isn't strictly necessary for this criterion, but it should improve performance. regex=False对于此标准不是严格必需的，但是它可以提高性能。 na=False ensures any type which cannot be processed via str methods returns False . na=False确保无法通过str方法处理的任何类型返回False 。

pd.DataFrame.loc pd.DataFrame.loc

Alternatively, you can use pd.DataFrame.loc . 另外，您可以使用pd.DataFrame.loc 。 This seems more natural than specifying an "unchanged" series as a final argument to np.where : 这似乎比指定“不变”系列作为np.where的最终参数更自然：

mask = df['code'].str.contains('-CUT', regex=False, na=False)
df.loc[mask, 'name'] = df['name'].str[:-4]

熊猫-numpy.where问题

问题描述

1 个解决方案

解决方案1
5 已采纳 2018-08-08 13:48:33

np.where np.where

pd.DataFrame.loc pd.DataFrame.loc

熊猫-numpy.where问题

问题描述

1 个解决方案

解决方案1 5 已采纳 2018-08-08 13:48:33

np.where np.where

pd.DataFrame.loc pd.DataFrame.loc

解决方案1
5 已采纳 2018-08-08 13:48:33