str.translate() 方法給出了針對 Pandas 系列的錯誤

Question

我有一個 3 列的 DataFrame。 我希望使用的 2 列是Dog_Summary和Dog_Description 。 這些列是字符串，我希望刪除它們可能具有的任何標點符號。

我嘗試了以下方法：

df[['Dog_Summary', 'Dog_Description']] = df[['Dog_Summary', 'Dog_Description']].apply(lambda x: x.str.translate(None, string.punctuation))

對於上述我收到一個錯誤說：

ValueError: ('deletechars is not a valid argument for str.translate in python 3. You should simply specify character deletions in the table argument', 'occurred at index Summary')

我嘗試的第二種方法是：

df[['Dog_Summary', 'Dog_Description']] = df[['Dog_Summary', 'Dog_Description']].apply(lambda x: x.replace(string.punctuation, ' '))

但是，它仍然不起作用！

誰能給我建議或意見

謝謝！ :)

Answer 1

我想刪除它可能有的任何標點符號。

您可以為此使用正則表達式和string.punctuation ：

>>> import pandas as pd
>>> from string import punctuation
>>> s = pd.Series(['abcd$*%&efg', '  xyz@)$(@rst'])
>>> s.str.replace(rf'[{punctuation}]', '')
0     abcdefg
1      xyzrst
dtype: object

.str.replace()的第一個參數可以是正則表達式。 在這種情況下，您可以使用 f 字符串和字符類來捕獲任何標點字符：

>>> rf'[{punctuation}]'
'[!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~]'  # ' and \ are escaped

如果您想將其應用於 DataFrame，只需按照您現在正在執行的操作：

df.loc[:, cols] = df[cols].apply(lambda s: s.str.replace(rf'[{punctuation}]', ''))

或者，您可以使用s.replace(rf'[{punctuation}]', '', regex=True) （無.str訪問器）。

str.translate() 方法給出了針對 Pandas 系列的錯誤

問題描述

1 個解決方案

解決方案1
1 已采納 2018-10-10 12:51:17

str.translate() 方法給出了針對 Pandas 系列的錯誤

問題描述

1 個解決方案

解決方案1 1 已采納 2018-10-10 12:51:17

解決方案1
1 已采納 2018-10-10 12:51:17