在 MultiIndex pandas 的子选择中设置值

Question

考虑以下数据框：

import numpy as np
import pandas as pd

arrays1 = [
    [
        "A",
        "A",
        "A",
        "B",
        "B",
        "B",
        "C",
        "C",
        "C",
        "D",
        "D",
        "D",
    ],
    [
        "qux",
        "quux",
        "corge",
        "qux",
        "quux",
        "corge",
        "qux",
        "quux",
        "corge",
        "qux",
        "quux",
        "corge",
    ],
    [
        "one",
        "two",
        "three",
        "one",
        "two",
        "three",
        "one",
        "two",
        "three",
        "one",
        "two",
        "three",
    ],
]
tuples1 = list(zip(*arrays1))
index_values1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(
    np.ones((12, 12)), index=index_values1, columns=index_values1
)

产量：

                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
B qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
C qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
D qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0

假设我想将所有内容设置为零，但以下行和列除外：

                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
B qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
C qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
D qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0

这就是将除 A 列和 B 列以及索引 C 和 D 中的 (quux, two), (corge, three) 之外的所有内容都设置为零。

要在索引 B 和 C 中选择列 A 和 B 以及(quux, two), (corge, three) ，我希望能够做到：

l_col_lvl0 = ['A', 'B']
l_idx_lvl0 = ['C', 'D']
l_idx_lvl1 = [("quux", "two"), ("corge", "three")]
df1_a_bc_qc = df1.loc[(l_idx_lvl0, l_idx_lvl1), l_col_lvl0]

但是，这会返回一个空的 DataFrame 和以下错误消息：

FutureWarning: The behavior of indexing on a MultiIndex with a nested sequence of 
labels is deprecated and will change in a future version. `series.loc[label, sequence]` 
will raise if any members of 'sequence' or not present in the index's second level. 
To retain the old behavior, use `series.index.isin(sequence, level=1)`
  df1_a_bc_qc = df1.loc[(l_idx_lvl0, l_idx_lvl1), l_col_lvl0]

反过来，我可以选择列 A 和 B，以及索引 C 和 D。

df1_ab_cd = df1.loc[l_idx_lvl0, l_col_lvl0]

                 A               B           
               qux quux corge  qux quux corge
               one  two three  one  two three
C qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0
D qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0

此外，我可以在索引 C 或 D 中选择 (quux, two), (corge, three)：

df1_ab_c_qc = df1.loc['C',l_col_lvl0].loc[l_idx_lvl1]

                   A               B           
             qux quux corge  qux quux corge
             one  two three  one  two three
quux  two    1.0  1.0   1.0  1.0  1.0   1.0
corge three  1.0  1.0   1.0  1.0  1.0   1.0

df1_ab_d_qc = df1.loc['D',l_col_lvl0].loc[l_idx_lvl1]

                   A               B           
             qux quux corge  qux quux corge
             one  two three  one  two three
quux  two    1.0  1.0   1.0  1.0  1.0   1.0
corge three  1.0  1.0   1.0  1.0  1.0   1.0

但是，如果我理解正确的话，不鼓励链式分配。

此外，如果我尝试传递l_idx_lvl0 ，则会收到以下错误消息：

df1_ab_cd_qc = df1.loc[l_idx_lvl0,l_col_lvl0].loc[l_idx_lvl1]

ValueError: operands could not be broadcast together with shapes (2,2) (3,) (2,2)

总之，除了索引 B 和 C 中的 A 列和 B 列以及 (quux, two), (corge, three) 之外，我如何将 verything 设置为 0？

我相信问题 6（在 pandas MultiIndex DataFrame 中选择行）的解决方案非常接近我正在寻找的内容，尽管我还没有让它适用于这种情况。

我想通过传递列表而不是单个标签来灵活地分配任务。 此外，级别1和级别2中的标签优选不分开。 也就是说，(quux, two) 和 (corge, three) 应该一起传递，而不是按级别传递。 我提到这个的原因是，我看到在 Q6 中，标签是按级别传递的（即df.loc[(('a', 'b'), ('u', 'v')), :] ） .

任何帮助深表感谢。

Answer 1

您不能通过排除直接选择。 您可以做的是构建用于索引的 2D 掩码。

mask = pd.DataFrame(True, index=df1.index, columns=df1.columns)
idx = pd.MultiIndex.from_tuples([
            ('C',  'quux',   'two'),
            ('C', 'corge', 'three'),
            ('D',  'quux',   'two'),
            ('D', 'corge', 'three')])


mask.loc[idx, ['A', 'B']] = False
df1[mask] = 0

print(df1)

输出：


                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
B qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
C qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
D qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0

Answer 2

感谢@mozway 的回答，我解决了这个问题如下：

l_col_lvl0 = ["A", "B"]
l_idx_lvl0 = ["C", "D"]
l_idx_lvl12 = [("quux", "two"), ("corge", "three")]

l_idx = []
for idx_lvl0 in l_idx_lvl0:
    for t_idx_lvl1 in l_idx_lvl12:
        idx_lvl1, idx_lvl2 = t_idx_lvl1
        t_idx_lvl012 = (idx_lvl0, idx_lvl1, idx_lvl2)
        l_idx.append(t_idx_lvl012)

idx = pd.MultiIndex.from_tuples(l_idx)

df1_sel = df1.loc[idx, l_col_lvl0]
df1_0 = df1.copy()*0
df1_0.loc[idx, l_col_lvl0] = df1_sel



                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
B qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
C qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
D qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0

特别是，答案帮助我意识到我应该首先详细创建具有所有级别的 MultiIndex，然后我可以按预期进行选择。

在 MultiIndex pandas 的子选择中设置值

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-12-20 19:25:41

解决方案2
0 2022-12-21 10:06:25

在 MultiIndex pandas 的子选择中设置值

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-12-20 19:25:41

解决方案2 0 2022-12-21 10:06:25

解决方案1
1 已采纳 2022-12-20 19:25:41

解决方案2
0 2022-12-21 10:06:25