简体   繁体   English

在 MultiIndex pandas 的子选择中设置值

[英]Setting values in subselection of MultiIndex pandas

Consider the following DataFrame:考虑以下数据框:

import numpy as np
import pandas as pd

arrays1 = [
    [
        "A",
        "A",
        "A",
        "B",
        "B",
        "B",
        "C",
        "C",
        "C",
        "D",
        "D",
        "D",
    ],
    [
        "qux",
        "quux",
        "corge",
        "qux",
        "quux",
        "corge",
        "qux",
        "quux",
        "corge",
        "qux",
        "quux",
        "corge",
    ],
    [
        "one",
        "two",
        "three",
        "one",
        "two",
        "three",
        "one",
        "two",
        "three",
        "one",
        "two",
        "three",
    ],
]
tuples1 = list(zip(*arrays1))
index_values1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(
    np.ones((12, 12)), index=index_values1, columns=index_values1
)

Yielding:产量:

                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
B qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
C qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
D qux   one    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0  1.0  1.0   1.0
  

Say I want to set everything to zero, except for the following rows and columns:假设我想将所有内容设置为零,但以下行和列除外:

                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
B qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
C qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
D qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0  
  

That is set everything to zero except for column A and B, and (quux, two), (corge, three) in index C and D.这就是将除 A 列和 B 列以及索引 C 和 D 中的 (quux, two), (corge, three) 之外的所有内容都设置为零。

To select column A and B and (quux, two), (corge, three) in both index B and C, I expected to be able to do:要在索引 B 和 C 中选择列 A 和 B 以及(quux, two), (corge, three) ,我希望能够做到:

l_col_lvl0 = ['A', 'B']
l_idx_lvl0 = ['C', 'D']
l_idx_lvl1 = [("quux", "two"), ("corge", "three")]
df1_a_bc_qc = df1.loc[(l_idx_lvl0, l_idx_lvl1), l_col_lvl0]

However, this returns an empty DataFrame, and the following error message:但是,这会返回一个空的 DataFrame 和以下错误消息:

FutureWarning: The behavior of indexing on a MultiIndex with a nested sequence of 
labels is deprecated and will change in a future version. `series.loc[label, sequence]` 
will raise if any members of 'sequence' or not present in the index's second level. 
To retain the old behavior, use `series.index.isin(sequence, level=1)`
  df1_a_bc_qc = df1.loc[(l_idx_lvl0, l_idx_lvl1), l_col_lvl0]    
  

In turn, I can select column A and B, and indices C and D.反过来,我可以选择列 A 和 B,以及索引 C 和 D。

df1_ab_cd = df1.loc[l_idx_lvl0, l_col_lvl0]

                 A               B           
               qux quux corge  qux quux corge
               one  two three  one  two three
C qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0
D qux   one    1.0  1.0   1.0  1.0  1.0   1.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0
  

Moreover, I can select (quux, two), (corge, three) in either index C or D:此外,我可以在索引 C 或 D 中选择 (quux, two), (corge, three):

df1_ab_c_qc = df1.loc['C',l_col_lvl0].loc[l_idx_lvl1]

                   A               B           
             qux quux corge  qux quux corge
             one  two three  one  two three
quux  two    1.0  1.0   1.0  1.0  1.0   1.0
corge three  1.0  1.0   1.0  1.0  1.0   1.0

df1_ab_d_qc = df1.loc['D',l_col_lvl0].loc[l_idx_lvl1]

                   A               B           
             qux quux corge  qux quux corge
             one  two three  one  two three
quux  two    1.0  1.0   1.0  1.0  1.0   1.0
corge three  1.0  1.0   1.0  1.0  1.0   1.0

However, if I understand correctly, chained assignments are discouraged.但是,如果我理解正确的话,不鼓励链式分配。

Moreover, if I try to pass l_idx_lvl0 instead, I get the following error message:此外,如果我尝试传递l_idx_lvl0 ,则会收到以下错误消息:

df1_ab_cd_qc = df1.loc[l_idx_lvl0,l_col_lvl0].loc[l_idx_lvl1]

ValueError: operands could not be broadcast together with shapes (2,2) (3,) (2,2) 

In conclusion, how can I set verything to 0, except for column A and B, and (quux, two), (corge, three) in index B and C?总之,除了索引 B 和 C 中的 A 列和 B 列以及 (quux, two), (corge, three) 之外,我如何将 verything 设置为 0?

I believe that the solution to Question 6 ( Select rows in pandas MultiIndex DataFrame ) is very close to what I'm looking for, though I haven't gotten it to work for this case.我相信问题 6( 在 pandas MultiIndex DataFrame 中选择行)的解决方案非常接近我正在寻找的内容,尽管我还没有让它适用于这种情况。

I would like to be flexible in the assignment by passing lists, instead of individual labels.我想通过传递列表而不是单个标签来灵活地分配任务。 Moreover, labels in level 1 and 2 are preferably not separated.此外,级别1和级别2中的标签优选不分开。 That is, (quux, two) and (corge, three) should be passed together, instead of per level.也就是说,(quux, two) 和 (corge, three) 应该一起传递,而不是按级别传递。 The reason I mention this, is that I see that in Q6, labels are passed per level (ie df.loc[(('a', 'b'), ('u', 'v')), :] ).我提到这个的原因是,我看到在 Q6 中,标签是按级别传递的(即df.loc[(('a', 'b'), ('u', 'v')), :] ) .

Any help is much appreciated.任何帮助深表感谢。

You can't select directly by exclusion.您不能通过排除直接选择。 What you can do is building a 2D mask for indexing.您可以做的是构建用于索引的 2D 掩码。

mask = pd.DataFrame(True, index=df1.index, columns=df1.columns)
idx = pd.MultiIndex.from_tuples([
            ('C',  'quux',   'two'),
            ('C', 'corge', 'three'),
            ('D',  'quux',   'two'),
            ('D', 'corge', 'three')])


mask.loc[idx, ['A', 'B']] = False
df1[mask] = 0

print(df1)

Output:输出:


                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
B qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
C qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
D qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0

Thanks to @mozway's answer, I resolved this issue as follows:感谢@mozway 的回答,我解决了这个问题如下:

l_col_lvl0 = ["A", "B"]
l_idx_lvl0 = ["C", "D"]
l_idx_lvl12 = [("quux", "two"), ("corge", "three")]

l_idx = []
for idx_lvl0 in l_idx_lvl0:
    for t_idx_lvl1 in l_idx_lvl12:
        idx_lvl1, idx_lvl2 = t_idx_lvl1
        t_idx_lvl012 = (idx_lvl0, idx_lvl1, idx_lvl2)
        l_idx.append(t_idx_lvl012)

idx = pd.MultiIndex.from_tuples(l_idx)

df1_sel = df1.loc[idx, l_col_lvl0]
df1_0 = df1.copy()*0
df1_0.loc[idx, l_col_lvl0] = df1_sel



                 A               B               C               D           
               qux quux corge  qux quux corge  qux quux corge  qux quux corge
               one  two three  one  two three  one  two three  one  two three
A qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
B qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
C qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
D qux   one    0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0  0.0  0.0   0.0
  quux  two    1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0
  corge three  1.0  1.0   1.0  1.0  1.0   1.0  0.0  0.0   0.0  0.0  0.0   0.0

In particular, the answer helped me realize that I should first verbosely create the MultiIndex with all levels, after which I can make the selection as intended.特别是,答案帮助我意识到我应该首先详细创建具有所有级别的 MultiIndex,然后我可以按预期进行选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM