[英]How to remove special characters from rows in pandas dataframe
I have a column in pandas data frame like the one shown below;我在 pandas 数据框中有一列,如下所示;
LGA
Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)
What I want to do, is to remove all the special characters from the ending of each row.我想要做的是从每一行的末尾删除所有特殊字符。 ie.
IE。 (S), (RC).
(S), (RC)。
Desired output should be;所需的 output 应该是;
LGA LGA
Alpine
Ararat
Ballarat
Banyule
Bass Coast
Baw Baw
Bayside
Benalla
Boroondara
I am not quite sure how to get desired output mentioned above.我不太确定如何获得上述所需的 output。
Any help would be appreciated.任何帮助,将不胜感激。
Thanks谢谢
I have different approach using regex.我使用正则表达式有不同的方法。 It will delete anything between brackets:
它将删除括号之间的任何内容:
import re
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)'] }
df = pd.DataFrame(df)
df['LGA'] = [re.sub("[\(\[].*?[\)\]]", "", x).strip() for x in df['LGA']] # delete anything between brackets
Here you go.这里是 go。 I have named the column (LGA) as Name in this case.
在这种情况下,我将列 (LGA) 命名为 Name。 You may use your current column name.
您可以使用您当前的列名。
df.Name = df.Name.apply(lambda x: x.split()[0])
df.Name = df.Name.apply(lambda x: x.split()[0])
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)'] }
df = pd.DataFrame(df)
df[['LGA','throw away']] = df['LGA'].str.split('(',expand=True)
You can use Pandas
str.replace您可以使用
Pandas
…
dataf['LGA'] = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)
import pandas as pd
dataf = pd.DataFrame({
"LGA":\
"""Alpine (S)
Ararat (RC)
Ballarat (C)
Banyule (C)
Bass Coast (S)
Baw Baw (S)
Bayside (C)
Benalla (RC)
Boroondara (C)""".split("\n")
})
output = dataf['LGA'].str.replace(r"\([^()]*\)", "", regex=True)
print(output)
0 Alpine
1 Ararat
2 Ballarat
3 Banyule
4 Bass Coast
5 Baw Baw
6 Bayside
7 Benalla
8 Boroondara
Name: LGA, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.