[英]How do I assign to a Pandas DataFrame while selecting by MultiIndex name?
Main question : How do I select/slice a multi-indexed DataFrame, using the name of the MultiIndex level, in a way that allows me to assign to that slice?主要问题:如何使用 MultiIndex 级别的名称以允许我分配给该切片的方式选择/切片多索引 DataFrame?
data = io.StringIO('''Fruit,Color,Count,Price
Apple,Red,3,$1.29
Apple,Green,9,$0.99
Pear,Red,25,$2.59
Pear,Green,26,$2.79
Lime,Green,9999,$0.39
''')
df_fruit = pd.read_csv(data, index_col=['Fruit', 'Color'])
new_green_data = io.StringIO('''Fruit,Count,Price
Apple,2,$0.96
Lime,9993,$0.40
Pear,12,$2.90
''')
df_new_green = pd.read_csv(new_green_data, index_col='Fruit')
This sets up two DataFrames:这将设置两个 DataFrame:
df_fruit
: df_fruit
:
| Fruit | Color | Count | Price |
|:--------|:--------|--------:|:--------|
| Apple | Red | 3 | $1.29 |
| Apple | Green | 9 | $0.99 |
| Pear | Red | 25 | $2.59 |
| Pear | Green | 26 | $2.79 |
| Lime | Green | 9999 | $0.39 |
df_new_green
: df_new_green
:
| Fruit | Count | Price |
|:--------|--------:|:--------|
| Apple | 2 | $0.96 |
| Lime | 9993 | $0.40 |
| Pear | 12 | $2.90 |
I want to update the rows in df_fruit
, in which Color
is Green
, so that they match the values in the incoming df_new_green
data.我想更新
df_fruit
的行,其中Color
是Green
,以便它们与传入的df_new_green
数据中的值匹配。 The final output should be:最终输出应该是:
| Fruit | Color | Count | Price |
|:--------|:--------|--------:|:--------|
| Apple | Red | 3 | $1.29 |
| Apple | Green | 2 | $0.96 |
| Pear | Red | 25 | $2.59 |
| Pear | Green | 12 | $2.90 |
| Lime | Green | 9993 | $0.40 |
Note that the order of the fruits in df_new_green
differs from df_fruit
.需要注意的是水果的顺序
df_new_green
不同于df_fruit
。 Thus, when performing assignment, I need to preserve the indices of both sides so that it's handled correctly.因此,在执行分配时,我需要保留双方的索引,以便正确处理。
I know several ways to select what I want to update in the DataFrame:我知道几种方法来选择我想在 DataFrame 中更新的内容:
df_fruit.xs(key='Green', level='Color')
This produces the right view of the data, but I can't assign to it.这会产生数据的正确视图,但我无法分配给它。 Similarly close:
同样接近:
df_fruit[df_fruit.index.get_level_values('Color') == 'Green']
and和
idx = pd.IndexSlice
df_fruit.loc[idx[:, 'Green'], :]
both give me the same view, but they still include the Color
level of the MultiIndex:两者都给我相同的视图,但它们仍然包含 MultiIndex 的
Color
级别:
| Fruit | Color | Count | Price |
|:--------|:--------|--------:|:--------|
| Apple | Green | 9 | $0.99 |
| Pear | Green | 26 | $2.79 |
| Lime | Green | 9999 | $0.39 |
I can assign to this view using df_new_green
, but this yields NaN
s because the df_new_green
does not include the Color
level in its index.我可以使用
df_new_green
分配给这个视图,但这会产生NaN
s,因为df_new_green
不包括它的索引中的Color
级别。 The second choice (using IndexSlice
) is also not great because I'm not selecting the level based on its name, but rather its position in the MultiIndex.第二个选择(使用
IndexSlice
)也不是很好,因为我不是根据其名称选择级别,而是根据其在 MultiIndex 中的位置。 If I run droplevel('Green')
on either one, again I get the right view but I can't assign to it.如果我在其中任何一个上运行
droplevel('Green')
,我再次获得正确的视图,但我无法分配给它。
I could drop the index on the new values, but this leads do the wrong values being used:我可以删除新值的索引,但这会导致使用错误的值:
df_fruit.loc[idx[:, 'Green'], :] = df_new_green._values
This yields:这产生:
| Fruit | Color | Count | Price |
|:--------|:--------|--------:|:--------|
| Apple | Red | 3 | $1.29 |
| Apple | Green | 2 | $0.96 |
| Pear | Red | 25 | $2.59 |
| Pear | Green | 9993 | $0.40 |
| Lime | Green | 12 | $2.90 |
...but this is wrong because the Pear and Lime values got swapped. ...但这是错误的,因为 Pear 和 Lime 值被交换了。 I need to preserve the index(es) on the update DataFrame.
我需要保留更新 DataFrame 上的索引。
df_fruit[df_fruit.index.get_level_values('Color') == 'Green'] = df_new_green.assign(Color='Green').set_index('Color', append=True)
...guh. ……唔。 This yields the right answer and meets the requirements, but holy cow that's ugly.
这产生了正确的答案并满足要求,但丑陋的神牛。
I'd use assign
and set_index
then combine_first
:我会使用
assign
和set_index
然后combine_first
:
(df_new_green.assign(Color='Green')
.set_index('Color', append=True)
.combine_first(df_fruit))
Output:输出:
| | Fruit | Color | Count | Price |
|---:|:--------|:--------|--------:|:--------|
| 0 | Apple | Green | 2 | $0.96 |
| 1 | Apple | Red | 3 | $1.29 |
| 2 | Lime | Green | 9993 | $0.40 |
| 3 | Pear | Green | 12 | $2.90 |
| 4 | Pear | Red | 25 | $2.59 |
The solution is to:解决办法是:
The code to do it is:执行此操作的代码是:
df_fruit.update(df_new_green.set_index(pd.Index(
['Green'] * df_new_green.index.size, name='Color'), append=True))
Not very nice but it does what it takes.不是很好,但它做了它需要的。
new_prices = []
for index, row in df_fruit.iterrows():
if index[1] == 'Green':
price = df_new_green.loc[index[0], ['Price']].values[0]
new_prices.append(price)
else:
new_prices.append(row['Price'])
df_fruit['Price'] = new_prices
Output:输出:
Count Price
Fruit Color
Apple Red 3 $1.29
Green 9 $0.96
Pear Red 25 $2.59
Green 26 $2.90
Lime Green 9999 $0.40
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.