简体   繁体   中英

How to remove special characters from rows in pandas dataframe

I have a column in pandas data frame like the one shown below;

LGA

Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)

What I want to do, is to remove all the special characters from the ending of each row. ie. (S), (RC).

Desired output should be;

LGA

Alpine
Ararat
Ballarat
Banyule
Bass Coast
Baw Baw
Bayside
Benalla
Boroondara

I am not quite sure how to get desired output mentioned above.

Any help would be appreciated.

Thanks

I have different approach using regex. It will delete anything between brackets:

import re
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)']  }
df = pd.DataFrame(df)
df['LGA'] = [re.sub("[\(\[].*?[\)\]]", "", x).strip() for x in df['LGA']] # delete anything between brackets

Here you go. I have named the column (LGA) as Name in this case. You may use your current column name.

df.Name = df.Name.apply(lambda x: x.split()[0])

import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)']  }
df = pd.DataFrame(df)
df[['LGA','throw away']] = df['LGA'].str.split('(',expand=True)

You can use Pandas str.replace


…
dataf['LGA'] = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)

Demo


import pandas as pd

dataf = pd.DataFrame({
"LGA":\
"""Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)""".split("\n")
})

output = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)

print(output)
0        Alpine 
1        Ararat 
2      Ballarat 
3       Banyule 
4    Bass Coast 
5       Baw Baw 
6       Bayside 
7       Benalla 
8    Boroondara 
Name: LGA, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM