简体   繁体   English

删除 Pandas DataFrame 中的正则表达式、方括号、单引号和双引号

[英]Remove regex, square brackets, single and double quotes in Pandas DataFrame

I am trying remove regex, square brackets, single and double-quotes and replace it with empty strings.我正在尝试删除正则表达式、方括号、单引号和双引号并将其替换为空字符串。 I am not doing it right.我做得不对。 Input is as below:输入如下:

Accident_type                      Injury_classification        
                          
['Strike fixed/station obj']     ["Assault in PI Cases", 'Other Injuries']
['Slip, trip, fall']             ["Work Related Injury", 'Other Injuries']
etc

I tried df['Injury_classification'].str.replace(r" \\(.*\\)","") and it doesn't remove anything.我试过df['Injury_classification'].str.replace(r" \\(.*\\)","")它没有删除任何东西。 The code ran but it's the same results with nothing removed.代码运行了,但结果相同,没有删除任何内容。

I then tried然后我尝试

df['Injury_classification'] = pd.DataFrame([str(line).strip('[').strip(']').strip('\'').strip('\'').strip('"') for line in df['Injury_classification']])

Current output:电流输出:

Accident_type                      Injury_classification      
                                 
empty                       Assault in PI Cases", 'Other Injuries
empty                       Work Related Injury", 'Other Injuries
etc

As you can see, there are still some single-quotes and sometimes double-quotes as well.如您所见,仍然有一些单引号,有时也有双引号。 I am wondering how to deal with this?我想知道如何处理这个问题? I have about 20-30 columns that have similar structures.我有大约 20-30 个具有类似结构的列。 Right now, I am running line by line for the same command but it's not efficient for that many columns.现在,我正在逐行运行相同的命令,但对于那么多列来说效率不高。 I wonder how can I write a loop to remove regex, single and double-quotes for all columns?我想知道如何编写一个循环来删除所有列的正则表达式、单引号和双引号?

Expected Output:预期输出:

Accident_type                      Injury_classification      
                                 
Strike fixed/station obj    Assault in PI Cases, Other Injuries
Slip, trip, fall            Work Related Injury, Other Injuries
etc

Thanks谢谢

I would just use str.replace here with a character class:我只想在这里使用带有字符类的str.replace

df['Injury_classification'] = df['Injury_classification'].str.replace("[\[\]\"']", "")

This would the input ['Slip', 'trip', "fall"] into Slip, trip fall .这会将输入['Slip', 'trip', "fall"]输入Slip, trip fall

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM