按索引级别给 Pandas Multiindex DataFrame 赋值

Question

我有一个 Pandas 多索引 dataframe，我需要将值分配给系列中的其中一列。 该系列与 dataframe 索引的第一级共享其索引。

import pandas as pd
import numpy as np
idx0 = np.array(['bar', 'bar', 'bar', 'baz', 'foo', 'foo'])
idx1 = np.array(['one', 'two', 'three', 'one', 'one', 'two'])
df = pd.DataFrame(index = [idx0, idx1], columns = ['A', 'B'])
s = pd.Series([True, False, True],index = np.unique(idx0))
print df
print s

出去：

             A    B
bar one    NaN  NaN
    two    NaN  NaN
    three  NaN  NaN
baz one    NaN  NaN
foo one    NaN  NaN
    two    NaN  NaN

bar     True
baz    False
foo     True
dtype: bool

这些不起作用：

df.A = s # does not raise an error, but does nothing
df.loc[s.index,'A'] = s # raises an error

预计 output：

             A     B
bar one    True   NaN
    two    True   NaN
    three  True   NaN
baz one    False  NaN
foo one    True   NaN
    two    True   NaN

Answer 1

系列（和字典）可以像map和apply一样使用函数（感谢@normanius改进语法）：

df['A'] = pd.Series(df.index.get_level_values(0)).map(s).values

或类似地：

df['A'] = df.reset_index(level=0)['level_0'].map(s).values

结果：

A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

Answer 2

df.A = s不会引发错误，但什么也不做

确实这应该有效。 ^{你的观点实际上与我的观点有关。}

ᐊᐊ解决方法ᐊᐊ

>>> s.index = pd.Index((c,) for c in s.index)  # ᐊᐊᐊᐊᐊᐊᐊᐊ
>>> df.A = s
>>> df
               A    B
bar one     True  NaN
    two     True  NaN
    three   True  NaN
baz one    False  NaN
foo one     True  NaN
    two     True  NaN

为什么上面的工作？

因为当您直接df.A = s而没有解决方法时，您实际上是在尝试在子类实例中分配pandas.Index的坐标，^{这在某种程度上看起来像是对LS 原则的“反反对”，}即pandas.MultiIndex的实例pandas.MultiIndex 。 我的意思是，寻找你自己：

>>> type(s.index).__name__
'Index'

然而

>>> type(df.index).__name__
'MultiIndex'

因此，此解决方法包括将s的索引转换为一维pandas.MultiIndex实例。

>>> s.index = pd.Index((c,) for c in s.index)
>>> type(s.index).__name__
'MultiIndex'

一切都没有明显改变

>>> s
bar     True
baz    False
foo     True
dtype: bool

一个想法：从许多观点（数学，本体论）来看，所有这些都以某种方式表明pandas.Index应该被设计为pandas.MultiIndex的子类，而不是像现在这样相反。

Answer 3

您可以在df DataFrame 上使用join方法，但您需要相应地命名索引和系列：

>>> df.index.names = ('lvl0', 'lvl1')
>>> s.index.name = 'lvl0'
>>> s.name = 'new_col'

然后 join 方法在 DataFrame 中创建一个新列：

>>> df.join(s)
              A    B  new_col
lvl0 lvl1
bar  one    NaN  NaN     True
     two    NaN  NaN     True
     three  NaN  NaN     True
baz  one    NaN  NaN    False
foo  one    NaN  NaN     True
     two    NaN  NaN     True

要将其分配给现有列：

>>> df['A'] = df.join(s)['new_col']
>>> df
                A    B
lvl0 lvl1
bar  one     True  NaN
     two     True  NaN
     three   True  NaN
baz  one    False  NaN
foo  one     True  NaN
     two     True  NaN

按索引级别给 Pandas Multiindex DataFrame 赋值

问题描述

3 个解决方案

解决方案1
6 2015-05-08 12:51:49

解决方案2
2 已采纳 2021-07-31 15:15:19

解决方案3
0 2023-01-04 09:43:27

按索引级别给 Pandas Multiindex DataFrame 赋值

问题描述

3 个解决方案

解决方案1 6 2015-05-08 12:51:49

解决方案2 2 已采纳 2021-07-31 15:15:19

解决方案3 0 2023-01-04 09:43:27

解决方案1
6 2015-05-08 12:51:49

解决方案2
2 已采纳 2021-07-31 15:15:19

解决方案3
0 2023-01-04 09:43:27