Pandas 不替換大字符 dataframe

Question

當我創建一個小測試 dataframe 時，這段代碼工作正常，但在導入一個大的 excel 文件后嘗試使用它時，它沒有替換字符。

import pandas as pd
df = pd.DataFrame({'A':[1,2,3],
                    'B':[4,5,6],
                    'C':['`f;','d:','sda`sd'],
                    'D':['s','d;','d;p`'],
                    'E':[5,3,6],
                    'F':[7,4,3]})
df.replace({'`':''}, regex=True)

結果符合預期：

    A   B   C   D   E   F
0   1   4   f;  s   5   7
1   2   5   d:  d;  3   4
2   3   6   sdasd   d;p 6   3

但是，當我加載一個大的 excel 文件時：

import pandas as pd
excel_file = f'C:\testfile.xlsx'
df = pd.read_excel(excel_file,sheet_name='Details', dtype=str)
df.iloc[20831].loc['Group Number']

結果：

'008513L-0005 `'

然后運行替換：

df.replace({'`':''}, regex=True)
df.iloc[20831].loc['Group Number']

結果：

'008513L-0005 `'

Answer 1

我們可以用“字符串標點符號”function來解決你在自然語言處理方法中遇到的問題。

import string #We have defined the string library.
def remove_punctuation (text): #We form our function.
    no_punc = "".join([i for i in text if i not in string.punctuation])
    return no_punc
#We apply our function to the corresponding column of our data set.
df['C'] = df['C'].apply(lambda x: remove_punctuation (x))
df['D'] = df['D'].apply(lambda x: remove_punctuation (x))

在此處輸入圖像描述

Pandas 不替換大字符 dataframe

問題描述

1 個解決方案

解決方案1
1 2020-11-12 19:40:35

Pandas 不替換大字符 dataframe

問題描述

1 個解決方案

解決方案1 1 2020-11-12 19:40:35

解決方案1
1 2020-11-12 19:40:35