[英]Pandas - issue with numpy.where
I have the following line of code: 我有以下代码行:
# slice off the last 4 chars in name wherever its code contains the substring '-CUT'
df['name'] = np.where(df['code'].str.contains('-CUT'),
df['name'].str[:-4], df['name'])
However, this doesn't seem to be working correctly. 但是,这似乎无法正常工作。 It's slicing off the last 4 characters for the correct columns, but it's also doing it for rows where the code is None/empty (almost all instances).
它将正确的列切为最后4个字符,但对于代码为None / empty(几乎所有实例)的行,也是如此。
Is there anything obviously wrong with how I'm using np.where? 我如何使用np.where明显有问题吗?
You can specify regex=False
and na=False
as parameters to pd.Series.str.contains
so that only rows where your condition is met are updated: 您可以将
regex=False
和na=False
指定为pd.Series.str.contains
参数,以便仅更新满足条件的行:
df['name'] = np.where(df['code'].str.contains('-CUT', regex=False, na=False),
df['name'].str[:-4], df['name'])
regex=False
isn't strictly necessary for this criterion, but it should improve performance. regex=False
对于此标准不是严格必需的,但是它可以提高性能。 na=False
ensures any type which cannot be processed via str
methods returns False
. na=False
确保无法通过str
方法处理的任何类型返回False
。
Alternatively, you can use pd.DataFrame.loc
. 另外,您可以使用
pd.DataFrame.loc
。 This seems more natural than specifying an "unchanged" series as a final argument to np.where
: 这似乎比指定“不变”系列作为
np.where
的最终参数更自然:
mask = df['code'].str.contains('-CUT', regex=False, na=False)
df.loc[mask, 'name'] = df['name'].str[:-4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.