如何在pandas DataFrame中替換多個分類中的值

Question

我想替換包含多個分類的數據框中的某些值。

df = pd.DataFrame({'s1': ['a', 'b', 'c'], 's2': ['a', 'c', 'd']}, dtype='category')

如果我在單個列上應用.replace ，結果如預期：

>>> df.s1.replace('a', 1)
0    1
1    b
2    c
Name: s1, dtype: object

如果我對整個數據幀應用相同的操作，則會顯示錯誤（簡短版本）：

>>> df.replace('a', 1)
ValueError: Cannot setitem on a Categorical with a new category, set the categories first

During handling of the above exception, another exception occurred:
ValueError: Wrong number of dimensions

如果數據框包含整數作為類別，則會發生以下情況：

df = pd.DataFrame({'s1': [1, 2, 3], 's2': [1, 3, 4]}, dtype='category')

>>> df.replace(1, 3)
    s1  s2
0   3   3
1   2   3
2   3   4

但，

>>> df.replace(1, 2)
ValueError: Wrong number of dimensions

我錯過了什么？

Answer 1

沒有挖掘，這似乎對我來說是錯誤的。

我的工作
pd.DataFrame.apply與pd.Series.replace
這樣做的好處是您不需要改變任何類型。

df = pd.DataFrame({'s1': [1, 2, 3], 's2': [1, 3, 4]}, dtype='category')
df.apply(pd.Series.replace, to_replace=1, value=2)

  s1  s2
0  2   2
1  2   3
2  3   4

要么

df = pd.DataFrame({'s1': ['a', 'b', 'c'], 's2': ['a', 'c', 'd']}, dtype='category')
df.apply(pd.Series.replace, to_replace='a', value=1)

  s1 s2
0  1  1
1  b  c
2  c  d

@cᴏʟᴅsᴘᴇᴇᴅ的工作

df = pd.DataFrame({'s1': ['a', 'b', 'c'], 's2': ['a', 'c', 'd']}, dtype='category')
df.applymap(str).replace('a', 1)

  s1 s2
0  1  1
1  b  c
2  c  d

Answer 2

這種行為的原因是每列的不同分類值集：

In [224]: df.s1.cat.categories
Out[224]: Index(['a', 'b', 'c'], dtype='object')

In [225]: df.s2.cat.categories
Out[225]: Index(['a', 'c', 'd'], dtype='object')

因此，如果您將替換為兩個類別中的值，它將起作用：

In [226]: df.replace('d','a')
Out[226]:
  s1 s2
0  a  a
1  b  c
2  c  a

作為解決方案，您可能希望手動對列進行分類，使用：

pd.Categorical(..., categories=[...])

其中category將包含所有列的所有可能值...

如何在pandas DataFrame中替換多個分類中的值

問題描述

2 個解決方案

解決方案1
2 已采納 2018-02-15 12:49:51

解決方案2
2 2018-02-15 13:28:07

如何在pandas DataFrame中替換多個分類中的值

問題描述

2 個解決方案

解決方案1 2 已采納 2018-02-15 12:49:51

解決方案2 2 2018-02-15 13:28:07

解決方案1
2 已采納 2018-02-15 12:49:51

解決方案2
2 2018-02-15 13:28:07