[英]Pandas - Replace substrings from a column if not numeric
I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd']
.我有一个我想在列表中删除的后缀列表,比如
suffixes = ['inc','co','ltd']
。 I want to remove these from a column in a Pandas dataframe, and I have been doing this: df['name'] = df['name'].str.replace('|'.join(suffixes), '')
.我想从 Pandas dataframe 的列中删除这些,我一直在这样做:
df['name'] = df['name'].str.replace('|'.join(suffixes), '')
.
This works, but I do NOT want to remove the suffice if what remains is numeric.这可行,但如果剩下的是数字,我不想删除足够的内容。 For example, if the name is
123 inc
, I don't want to strip the 'inc'.例如,如果名称是
123 inc
,我不想去掉“inc”。 Is there a way to add this condition in the code?有没有办法在代码中添加这个条件?
Using Regex --> negative lookbehind
.使用正则表达式 -->
negative lookbehind
。
Ex:前任:
suffixes = ['inc','co','ltd']
df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)
Output: Output:
Col Col_2
0 Abc inc Abc
1 123 inc 123 inc
2 Abc co Abc
3 123 co 123 co
Try adding ^[^0-9]+
to the suffixes.尝试将
^[^0-9]+
添加到后缀。 It is a REGEX that literally means "at least one not numeric char before".它是一个正则表达式,字面意思是“之前至少有一个不是数字字符”。 The code would look like this:
代码如下所示:
non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.