简体   繁体   English

Pandas - 如果不是数字,则替换列中的子字符串

[英]Pandas - Replace substrings from a column if not numeric

I have a list of suffixes I want to remove in a list, say suffixes = ['inc','co','ltd'] .我有一个我想在列表中删除的后缀列表,比如suffixes = ['inc','co','ltd'] I want to remove these from a column in a Pandas dataframe, and I have been doing this: df['name'] = df['name'].str.replace('|'.join(suffixes), '') .我想从 Pandas dataframe 的列中删除这些,我一直在这样做: df['name'] = df['name'].str.replace('|'.join(suffixes), '') .

This works, but I do NOT want to remove the suffice if what remains is numeric.这可行,但如果剩下的是数字,我不想删除足够的内容。 For example, if the name is 123 inc , I don't want to strip the 'inc'.例如,如果名称是123 inc ,我不想去掉“inc”。 Is there a way to add this condition in the code?有没有办法在代码中添加这个条件?

Using Regex --> negative lookbehind .使用正则表达式 --> negative lookbehind

Ex:前任:

suffixes = ['inc','co','ltd']

df = pd.DataFrame({"Col": ["Abc inc", "123 inc", "Abc co", "123 co"]})
df['Col_2'] = df['Col'].str.replace(r"(?<!\d) \b(" + '|'.join(suffixes) + r")\b", '', regex=True)
print(df)

Output: Output:

       Col    Col_2
0  Abc inc      Abc
1  123 inc  123 inc
2   Abc co      Abc
3   123 co   123 co

Try adding ^[^0-9]+ to the suffixes.尝试将^[^0-9]+添加到后缀。 It is a REGEX that literally means "at least one not numeric char before".它是一个正则表达式,字面意思是“之前至少有一个不是数字字符”。 The code would look like this:代码如下所示:

non_numeric_regex = r"^[^0-9]+"
suffixes = ['inc','co','ltd']
regex_w_suffixes = [non_numeric_regex + suf for suf in suffixes]
df['name'] = df['name'].str.replace('|'.join(regex_w_suffixes ), '')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - Pandas - 根据其他列的值替换列中的字符串 - 处理子字符串 - Python - Pandas - Replace a string from a column based on the value from other column - Dealing with substrings Pandas - 使用替换+正则表达式从字符串列中提取数值 - Pandas - extract numeric values from string column using replace + regex Pandas DataFrame 从列中获取子字符串 - Pandas DataFrame get substrings from column 如何使用 pandas 中的字典替换列名中的多个子字符串? - how to replace multiple substrings in column names using a dictionary in pandas? 小写并替换 Pandas 中数据帧标头中不需要的子字符串 - Lowercase and replace unwanted substrings from dataframe header in Pandas Pandas:借助字典将变量子字符串从 A 列插入 B 列 - Pandas: Insert variable substrings into column B from column A with help of dictionary 熊猫在所有行的新列中用数字值替换特定的字符串 - pandas replace specific string with numeric value in a new column for all rows 使用 pandas 将非数字列值替换为浮动 - Replace non-numeric column values to float using pandas 如何替换后列定义的pandas数据框列中的可变子字符串? - How to replace variable substrings across a pandas dataframe column that are defined by the column after? 从pandas数据框的列中过滤数值 - Filter numeric values from a column of pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM