Pandas - 如果不是数字，则替换列中的子字符串

Question

I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd'] .我有一个我想在列表中删除的后缀列表，比如suffixes = ['inc','co','ltd'] 。 I want to remove these from a column in a Pandas dataframe, and I have been doing this: df['name'] = df['name'].str.replace('|'.join(suffixes), '') .我想从 Pandas dataframe 的列中删除这些，我一直在这样做： df['name'] = df['name'].str.replace('|'.join(suffixes), '') .

This works, but I do NOT want to remove the suffice if what remains is numeric.这可行，但如果剩下的是数字，我不想删除足够的内容。 For example, if the name is 123 inc , I don't want to strip the 'inc'.例如，如果名称是123 inc ，我不想去掉“inc”。 Is there a way to add this condition in the code?有没有办法在代码中添加这个条件？

Answer 1

Using Regex --> negative lookbehind .使用正则表达式 --> negative lookbehind 。

Ex:前任：

suffixes = ['inc','co','ltd']

df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)

Output: Output：

       Col    Col_2
0  Abc inc      Abc
1  123 inc  123 inc
2   Abc co      Abc
3   123 co   123 co

Answer 2

Try adding ^[^0-9]+ to the suffixes.尝试将^[^0-9]+添加到后缀。 It is a REGEX that literally means "at least one not numeric char before".它是一个正则表达式，字面意思是“之前至少有一个不是数字字符”。 The code would look like this:代码如下所示：

non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')

Pandas - 如果不是数字，则替换列中的子字符串

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-07-02 13:47:21

解决方案2
1 2020-07-02 13:34:13

Pandas - 如果不是数字，则替换列中的子字符串

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-07-02 13:47:21

解决方案2 1 2020-07-02 13:34:13

解决方案1
2 已采纳 2020-07-02 13:47:21

解决方案2
1 2020-07-02 13:34:13