如何在pandas數據幀中替換字符串中的子串

Question

我有一個數據框，以及我想從該數據框中的列中刪除的字符串列表。 但是當我使用替換功能時，這些字符仍然存在。 有人可以解釋為什么會這樣嗎？

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')', 
             '[', ']', '{', '}', ':', '&', '\n']

並取代：

df2['page'] = df2['page'].replace(bad_chars, '')

當我打印出df2 ：

for index, row in df2.iterrows():
    print( row['project'] + '\t' + '(' + row['page'] + ',' + str(row['viewCount']) + ')' + '\n'  )

en（The_Voice_（US_season_14），613）

Answer 1

一種方法是使用re轉義字符，然后使用pd.Series.str.replace 。

import pandas as pd
import re

bad_chars = ['?', '!', ',', ';', "'", '|', '-', '--', '(', ')', 
             '[', ']', '{', '}', ':', '&', '\n']

df = pd.DataFrame({'page': ['hello?', 'problems|here', 'nothingwronghere', 'nobrackets[]']})

df['page'] = df['page'].str.replace('|'.join([re.escape(s) for s in bad_chars]), '')

print(df)

#                page
# 0             hello
# 1      problemshere
# 2  nothingwronghere
# 3        nobrackets

Answer 2

使用.str.replace ，並將字符串作為單個管道分隔的字符串傳遞。 您可以使用re.escape()來從該字符串中轉義正則表達式字符，如@jpp所示。 我通過避免迭代來調整他的建議：

import re 
df2['page'] = df2['page'].str.replace(re.escape('|'.join(bad_chars)), '')

如何在pandas數據幀中替換字符串中的子串

問題描述

2 個解決方案

解決方案1
2 已采納 2018-04-14 17:30:59

解決方案2
1 2018-04-14 17:21:28

如何在pandas數據幀中替換字符串中的子串

問題描述

2 個解決方案

解決方案1 2 已采納 2018-04-14 17:30:59

解決方案2 1 2018-04-14 17:21:28

解決方案1
2 已采納 2018-04-14 17:30:59

解決方案2
1 2018-04-14 17:21:28