Pandas：如何将 MultiIndex DataFrame 与单个索引 DataFrame 连接起来，以及自定义排序

Question

I have a MultiIndex pandas DataFrame df_multi like:我有一个 MultiIndex pandas DataFrame df_multi像：

import pandas as pd

df_multi = pd.DataFrame([['A', 'A1', 0,234,2002],['A', 'A1', 1,324,2550],
['A', 'A1', 2,345,3207],['A', 'A1', 3,458,4560],['A', 'A2', 0,569,1980],
['A', 'A2', 1,657,2314],['A', 'A2', 2,768,4568],['A', 'A2', 3,823,5761]], 
columns=['Product','Scenario','Time','Quantity','Price']).set_index(
['Product', 'Scenario'])

and a single index DataFrame df_single like:和单个索引 DataFrame df_single像：

df_single = pd.DataFrame([['A', -3,100],['A', -2,100], ['A', -1,100]],
columns=['Product','Time','Quantity']).set_index(['Product'])

For every 'Product' in the first index level of df_multi , and for every 'Scenario' in its second level, I would like to append/concatenate the rows in df_single , which contain some negative 'Time' values to be appended before the positive 'Time' values in df_multi begin.对于df_multi的第一个索引级别中的每个“产品”，以及其第二个级别中的每个“场景”，我想附加/连接df_single的行，其中包含一些要附加在正值之前的负“时间”值df_multi “时间”值开始。

I would furthermore like the resulting DataFrame to be first MultiIndexed by ['Product','Scenario'] (just like df_multi ), then secondly with the rows ordered by ascending value of 'Time'.此外，我希望生成的 DataFrame 首先由 ['Product','Scenario']（就像df_multi ）进行df_multi ，然后按“时间”的升序值对行进行排序。 In other words, the desired result is:换句话说，想要的结果是：

df_result = pd.DataFrame([['A', 'A1', -3,100,'NaN'],['A', 'A1', -2,100,'NaN'],
['A', 'A1', -1,100,'NaN'],['A', 'A1', 0,234,2002],['A', 'A1', 1,324,2550],
['A', 'A1', 2,345,3207],['A', 'A1', 3,458,4560],['A','A2', -3,100,'NaN'],
['A', 'A2', -2,100,'NaN'],['A', 'A2', -1,100,'NaN'],['A', 'A2', 0,569,1980],
['A', 'A2', 1,657,2314],['A', 'A2', 2,768,4568],['A', 'A2', 3,823,5761]],
columns=['Product','Scenario','Time','Quantity','Price']).set_index(
['Product', 'Scenario'])

EDIT:编辑：

df_single has no 'Scenario' values, which can be confusing. df_single没有“场景”值，这可能会令人困惑。 As long as 'Product' matches, the same rows of df_single are to be appended to every scenario in df_multi , and they simply "inherit" the Scenario values for free.只要“产品”的比赛中，相同的行df_single将被追加到每一个场景df_multi ，他们只是“继承”的情景免费值。
The actual DataFrames I'm working with are rather large (few thousand 'Product', few thousand 'Scenario' per product, and a few hundred 'Time' steps per scenario, plus extra columns which I did not write in the example), so I need to do this in a fully automated (and hopefully fast) way.我正在使用的实际数据帧相当大（每个产品几千个“产品”，几千个“场景”，每个场景几百个“时间”步骤，加上我没有在示例中写的额外列），所以我需要以完全自动化的（希望是快速的）方式来做到这一点。

I tried to implement this with all of join , concat and merge , and I did not succeed.我试图用所有的join 、 concat和merge来实现这一点，但我没有成功。 What would be the best way of achieving the desired result?达到预期结果的最佳方法是什么？

Answer 1

Consider resetting indexes as columns for a merge , followed by a groupby aggregation only to return one occurrence per group and avoid duplicates.考虑将索引重置为merge列，然后是groupby聚合，只为每组返回一次并避免重复。 Afterwards, run a concatenation, concat , followed by column sorting and setting back the multi-index.然后，运行串联concat ，然后进行列排序并设置多索引。

# MERGE AND AGGREGATION
df_temp = df_multi.reset_index().merge(df_single.reset_index(), on='Product', suffixes=['','_'])\
                                .groupby(['Product', 'Scenario', 'Time_'])['Quantity_'].max()\
                                .reset_index().rename(columns={'Time_':'Time','Quantity_':'Quantity'})

# ROW BIND CONCATENATION
df_final = pd.concat([df_multi.reset_index(), df_temp])\
                    .sort_values(['Product','Scenario', 'Time'])\
                    .set_index(['Product', 'Scenario'])[['Time', 'Quantity', 'Price']]
print(df_final)
#                   Time  Quantity   Price
# Product Scenario                        
# A       A1          -3       100     NaN
#         A1          -2       100     NaN
#         A1          -1       100     NaN
#         A1           0       234  2002.0
#         A1           1       324  2550.0
#         A1           2       345  3207.0
#         A1           3       458  4560.0
#         A2          -3       100     NaN
#         A2          -2       100     NaN
#         A2          -1       100     NaN
#         A2           0       569  1980.0
#         A2           1       657  2314.0
#         A2           2       768  4568.0
#         A2           3       823  5761.0

Pandas：如何将 MultiIndex DataFrame 与单个索引 DataFrame 连接起来，以及自定义排序

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-11-30 16:45:16

Pandas：如何将 MultiIndex DataFrame 与单个索引 DataFrame 连接起来，以及自定义排序

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-11-30 16:45:16

解决方案1
1 已采纳 2017-11-30 16:45:16