简体   繁体   中英

Python DataFrame: Remove/Replace part of a string for all values in a column

In a Dataframe "df" I have a column called "Company". In there I have a list of companies that end with "- CP" the problem is that the spaces are not always in the same place and in some of the entries the dash "-" is missing. I want to remove the "-CP" from all entries.

Input

Company
Intest Apple - CP
Intest Apple -CP
Intest Apple-CP
Intest Apple - CP
Intest Apple CP
Howard P Delta - CP

Output

Company
Intest Apple
Intest Apple
Intest Apple
Intest Apple
Intest Apple
Howard P Delta

This is the code that I have, but when I run it nothing changes

df['Company'] = df['Company'].str.replace("-CP'","") 
df['Company'] = df['Company'].str.replace("- CP'","") 
df['Company'] = df['Company'].str.replace(" - CP'","") 
df['Company']=df['Company'].str.replace("-CP","")
df['Company'] = df['Company'].str.replace("- CP","") 
df['Company'] = df['Company'].str.replace(" - CP","") 

You could use str.replace with a regular expression to include the case where the dash can be missing ( -? ) and all variations of spaces between the CP string.

company = df.Company.str.replace('\s*-?\s*CP\s*$','', regex=True)

Output from company

Out[5]:
0      Intest Apple
1      Intest Apple
2      Intest Apple
3      Intest Apple
4      Intest Apple
5    Howard P Delta
Name: Company, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM