如何在按 MultiIndex 名称选择时分配给 Pandas DataFrame？

Question

Main question : How do I select/slice a multi-indexed DataFrame, using the name of the MultiIndex level, in a way that allows me to assign to that slice?主要问题：如何使用 MultiIndex 级别的名称以允许我分配给该切片的方式选择/切片多索引 DataFrame？

Test Data测试数据

data = io.StringIO('''Fruit,Color,Count,Price
Apple,Red,3,$1.29
Apple,Green,9,$0.99
Pear,Red,25,$2.59
Pear,Green,26,$2.79
Lime,Green,9999,$0.39
''')
df_fruit = pd.read_csv(data, index_col=['Fruit', 'Color'])

new_green_data = io.StringIO('''Fruit,Count,Price
Apple,2,$0.96
Lime,9993,$0.40
Pear,12,$2.90
''')
df_new_green = pd.read_csv(new_green_data, index_col='Fruit')

This sets up two DataFrames:这将设置两个 DataFrame：

df_fruit : df_fruit ：

| Fruit   | Color   |   Count | Price   |
|:--------|:--------|--------:|:--------|
| Apple   | Red     |       3 | $1.29   |
| Apple   | Green   |       9 | $0.99   |
| Pear    | Red     |      25 | $2.59   |
| Pear    | Green   |      26 | $2.79   |
| Lime    | Green   |    9999 | $0.39   |

df_new_green : df_new_green ：

| Fruit   |   Count | Price   |
|:--------|--------:|:--------|
| Apple   |       2 | $0.96   |
| Lime    |    9993 | $0.40   |
| Pear    |      12 | $2.90   |

The Want想要的

I want to update the rows in df_fruit , in which Color is Green , so that they match the values in the incoming df_new_green data.我想更新df_fruit的行，其中Color是Green ，以便它们与传入的df_new_green数据中的值匹配。 The final output should be:最终输出应该是：

| Fruit   | Color   |   Count | Price   |
|:--------|:--------|--------:|:--------|
| Apple   | Red     |       3 | $1.29   |
| Apple   | Green   |       2 | $0.96   |
| Pear    | Red     |      25 | $2.59   |
| Pear    | Green   |      12 | $2.90   |
| Lime    | Green   |    9993 | $0.40   |

Note that the order of the fruits in df_new_green differs from df_fruit .需要注意的是水果的顺序df_new_green不同于df_fruit 。 Thus, when performing assignment, I need to preserve the indices of both sides so that it's handled correctly.因此，在执行分配时，我需要保留双方的索引，以便正确处理。

What I Know我知道的

I know several ways to select what I want to update in the DataFrame:我知道几种方法来选择我想在 DataFrame 中更新的内容：

df_fruit.xs(key='Green', level='Color')

This produces the right view of the data, but I can't assign to it.这会产生数据的正确视图，但我无法分配给它。 Similarly close:同样接近：

df_fruit[df_fruit.index.get_level_values('Color') == 'Green']

and和

idx = pd.IndexSlice
df_fruit.loc[idx[:, 'Green'], :]

both give me the same view, but they still include the Color level of the MultiIndex:两者都给我相同的视图，但它们仍然包含 MultiIndex 的Color级别：

| Fruit   | Color   |   Count | Price   |
|:--------|:--------|--------:|:--------|
| Apple   | Green   |       9 | $0.99   |
| Pear    | Green   |      26 | $2.79   |
| Lime    | Green   |    9999 | $0.39   |

I can assign to this view using df_new_green , but this yields NaN s because the df_new_green does not include the Color level in its index.我可以使用df_new_green分配给这个视图，但这会产生NaN s，因为df_new_green不包括它的索引中的Color级别。 The second choice (using IndexSlice ) is also not great because I'm not selecting the level based on its name, but rather its position in the MultiIndex.第二个选择（使用IndexSlice ）也不是很好，因为我不是根据其名称选择级别，而是根据其在 MultiIndex 中的位置。 If I run droplevel('Green') on either one, again I get the right view but I can't assign to it.如果我在其中任何一个上运行droplevel('Green') ，我再次获得正确的视图，但我无法分配给它。

I could drop the index on the new values, but this leads do the wrong values being used:我可以删除新值的索引，但这会导致使用错误的值：

df_fruit.loc[idx[:, 'Green'], :] = df_new_green._values

This yields:这产生：

| Fruit   | Color   |   Count | Price   |
|:--------|:--------|--------:|:--------|
| Apple   | Red     |       3 | $1.29   |
| Apple   | Green   |       2 | $0.96   |
| Pear    | Red     |      25 | $2.59   |
| Pear    | Green   |    9993 | $0.40   |
| Lime    | Green   |      12 | $2.90   |

...but this is wrong because the Pear and Lime values got swapped. ...但这是错误的，因为 Pear 和 Lime 值被交换了。 I need to preserve the index(es) on the update DataFrame.我需要保留更新 DataFrame 上的索引。

The Ugly Way丑陋的方式

df_fruit[df_fruit.index.get_level_values('Color') == 'Green'] = df_new_green.assign(Color='Green').set_index('Color', append=True)

...guh. ……唔。 This yields the right answer and meets the requirements, but holy cow that's ugly.这产生了正确的答案并满足要求，但丑陋的神牛。

Answer 1

I'd use assign and set_index then combine_first :我会使用assign和set_index然后combine_first ：

(df_new_green.assign(Color='Green')
             .set_index('Color', append=True)
             .combine_first(df_fruit))

Output:输出：

|    | Fruit   | Color   |   Count | Price   |
|---:|:--------|:--------|--------:|:--------|
|  0 | Apple   | Green   |       2 | $0.96   |
|  1 | Apple   | Red     |       3 | $1.29   |
|  2 | Lime    | Green   |    9993 | $0.40   |
|  3 | Pear    | Green   |      12 | $2.90   |
|  4 | Pear    | Red     |      25 | $2.59   |

Answer 2

The solution is to:解决办法是：

Add Green as the second level of the index in df_new_green , setting its name to Color .在df_new_green 中添加Green作为索引的第二级，将其名称设置为Color 。
Update df_fruit (in-place) with this (temporary) DataFrame.使用此（临时）DataFrame 更新df_fruit （就地）。

The code to do it is:执行此操作的代码是：

df_fruit.update(df_new_green.set_index(pd.Index(
    ['Green'] * df_new_green.index.size, name='Color'), append=True))

Answer 3

Not very nice but it does what it takes.不是很好，但它做了它需要的。

new_prices = []
for index, row in df_fruit.iterrows():
    if index[1] == 'Green':
        price = df_new_green.loc[index[0], ['Price']].values[0]
        new_prices.append(price)
    else:
        new_prices.append(row['Price'])

df_fruit['Price'] = new_prices

Output:输出：

             Count  Price
Fruit Color              
Apple Red        3  $1.29
      Green      9  $0.96
Pear  Red       25  $2.59
      Green     26  $2.90
Lime  Green   9999  $0.40

如何在按 MultiIndex 名称选择时分配给 Pandas DataFrame？

问题描述

Test Data测试数据

The Want想要的

What I Know我知道的

The Ugly Way丑陋的方式

3 个解决方案

解决方案1
2 已采纳 2020-01-31 21:11:31

解决方案2
1 2020-01-31 20:44:51

解决方案3
0 2020-01-31 20:53:50

如何在按 MultiIndex 名称选择时分配给 Pandas DataFrame？

问题描述

Test Data测试数据

The Want想要的

What I Know我知道的

The Ugly Way丑陋的方式

3 个解决方案

解决方案1 2 已采纳 2020-01-31 21:11:31

解决方案2 1 2020-01-31 20:44:51

解决方案3 0 2020-01-31 20:53:50

解决方案1
2 已采纳 2020-01-31 21:11:31

解决方案2
1 2020-01-31 20:44:51

解决方案3
0 2020-01-31 20:53:50