简体   繁体   English

将 2 个 pandas 数据帧组合成 1 个多维一

[英]Combining 2 pandas dataframes to 1 multi-dimensional one

Dataframe 1 looks like this: Dataframe 1 看起来像这样:

df1 = pd.DataFrame(
    {
        "Farm ID": ["1", "2", "2", "3", "3"],
        "Crop": ["Type A", "Type A", "Type B", "Type B", "Type B"],
        "Area": [8, 4, 2, 3, 5],
        "Diesel": [101, 215, 3, 0.6, 42],
    }
)

df1 = df1.set_index(['Farm ID', 'Crop'])
df1

Dataframe 2 looks like this: Dataframe 2 看起来像这样:

df2 = pd.DataFrame(
    {
        "Name": ["Area", "Diesel"],
        "GHG": [690, 8.5],
        "LU": [2.2, 0.3],
    }
)

df2 = df2.set_index('Name')
df2

I now need to combine both such that I receive the following information:我现在需要将两者结合起来,以便收到以下信息:

                       GHG     LU
Farm ID Crop    Name    
1       Type A  Area   8*690    8*2.2
                Diesel 101*690  101*2.2
2       Type A  Area   4*690    4*2.2
                Diesel 215*690  215*2.2 
        Type B  Area   ....

Any suggestions welcome as I am completely clueless.欢迎任何建议,因为我完全一无所知。 I also take ideas if there are better ways to structure this.如果有更好的方法来构建它,我也会考虑。 I will have to do further analysis (eg aggregation by crop type or name, and similar) on the resulting dataframe and might think too complicated... Thanks a lot!我将不得不对生成的 dataframe 进行进一步分析(例如,按作物类型或名称进行聚合等),并且可能认为太复杂了......非常感谢!

We can do stack我们可以做堆栈

s = df1.stack()
out = df2.reindex(s.index.get_level_values(2)).mul(s.values,axis=0)
out.index = s.index
out
                          GHG     LU
Farm ID Crop                        
1       Type A Area    5520.0  17.60
               Diesel   858.5  30.30
2       Type A Area    2760.0   8.80
               Diesel  1827.5  64.50
        Type B Area    1380.0   4.40
               Diesel    25.5   0.90
3       Type B Area    2070.0   6.60
               Diesel     5.1   0.18
               Area    3450.0  11.00
               Diesel   357.0  12.60

You can stack the dataframe and let pandas broadcast on the common index:您可以stack dataframe 并让 pandas 在公共索引上广播:

df1.rename_axis('Name', axis=1).stack().mul(df2.T).T

Output: Output:

                          GHG     LU
Farm ID Crop   Name                 
1       Type A Area    5520.0  17.60
               Diesel   858.5  30.30
2       Type A Area    2760.0   8.80
               Diesel  1827.5  64.50
        Type B Area    1380.0   4.40
               Diesel    25.5   0.90
3       Type B Area    2070.0   6.60
               Diesel     5.1   0.18
               Area    3450.0  11.00
               Diesel   357.0  12.60

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM