如何從 pandas df 中刪除奇怪的編碼

Question

我有以下df：

import pandas as pd

df = pd.DataFrame({"name" : ["a", "b", "c"], "value" : ['1\xa0412', 4, 2]})

我想用 1 替換 '1\xa0412'。我試試這個：

df['value'] = df['value'].str.replace(r'\\.*', '', regex=True)

但它不起作用。 請問我該如何解決？

Answer 1

先嘗試使用unidecode庫處理數據，再嘗試替換。 對於類似的問題，它對我有用。

Answer 2

嘗試：

df.value = df.value.apply(repr).str.replace(r"(\\.*)|\'", r"", regex=True)

結果：

    name    value
0   a       1
1   b       4
2   c       2

但要小心，因為列value的類型是object 。 如果您想要另一個 dtype，則必須轉換該列。