删除 Pandas DataFrame 中的正则表达式、方括号、单引号和双引号

Question

I am trying remove regex, square brackets, single and double-quotes and replace it with empty strings.我正在尝试删除正则表达式、方括号、单引号和双引号并将其替换为空字符串。 I am not doing it right.我做得不对。 Input is as below:输入如下：

Accident_type                      Injury_classification        
                          
['Strike fixed/station obj']     ["Assault in PI Cases", 'Other Injuries']
['Slip, trip, fall']             ["Work Related Injury", 'Other Injuries']
etc

I tried df['Injury_classification'].str.replace(r" \\(.*\\)","") and it doesn't remove anything.我试过df['Injury_classification'].str.replace(r" \\(.*\\)","")它没有删除任何东西。 The code ran but it's the same results with nothing removed.代码运行了，但结果相同，没有删除任何内容。

I then tried然后我尝试

df['Injury_classification'] = pd.DataFrame([str(line).strip('[').strip(']').strip('\'').strip('\'').strip('"') for line in df['Injury_classification']])

Current output:电流输出：

Accident_type                      Injury_classification      
                                 
empty                       Assault in PI Cases", 'Other Injuries
empty                       Work Related Injury", 'Other Injuries
etc

As you can see, there are still some single-quotes and sometimes double-quotes as well.如您所见，仍然有一些单引号，有时也有双引号。 I am wondering how to deal with this?我想知道如何处理这个问题？ I have about 20-30 columns that have similar structures.我有大约 20-30 个具有类似结构的列。 Right now, I am running line by line for the same command but it's not efficient for that many columns.现在，我正在逐行运行相同的命令，但对于那么多列来说效率不高。 I wonder how can I write a loop to remove regex, single and double-quotes for all columns?我想知道如何编写一个循环来删除所有列的正则表达式、单引号和双引号？

Expected Output:预期输出：

Accident_type                      Injury_classification      
                                 
Strike fixed/station obj    Assault in PI Cases, Other Injuries
Slip, trip, fall            Work Related Injury, Other Injuries
etc

Thanks谢谢

Answer 1

I would just use str.replace here with a character class:我只想在这里使用带有字符类的str.replace ：

df['Injury_classification'] = df['Injury_classification'].str.replace("[\[\]\"']", "")

This would the input ['Slip', 'trip', "fall"] into Slip, trip fall .这会将输入['Slip', 'trip', "fall"]输入Slip, trip fall 。

删除 Pandas DataFrame 中的正则表达式、方括号、单引号和双引号

问题描述

1 个解决方案

解决方案1
0 2020-10-22 03:47:02

删除 Pandas DataFrame 中的正则表达式、方括号、单引号和双引号

问题描述

1 个解决方案

解决方案1 0 2020-10-22 03:47:02

解决方案1
0 2020-10-22 03:47:02