简体   繁体   English

如何找到多索引 dataframe 的两个键之间的重叠行数?

[英]How to find the overlapping count of rows between two keys of a multindex dataframe?

Two dataframes have been concatenated with different keys (multiindex dataframe) with same index.两个数据帧已与具有相同索引的不同键(多索引数据帧)连接。 Dates are the index.日期是索引。 There are different products in each dataframe as column names and their prices.每个 dataframe 中有不同的产品作为列名称及其价格。 I basically had to find the correlation between these two dataframes and overlapping period count.我基本上必须找到这两个数据帧和重叠周期数之间的相关性。 Correlation is done but how to find the count of overlapping rows with each product from each dataframe and produce result as a dataframe with products from dataframe 1 as column name and products from dataframe2 as row names and values as the number of overlapping rows for the same period.相关性已完成,但如何找到每个 dataframe 中每个产品的重叠行数,并生成 dataframe 的结果,其中 dataframe 1 中的产品作为列名,dataframe2 中的产品作为行名,值作为相同的重叠行数时期。 It should be a matrix.它应该是一个矩阵。


For example: Dataframe1:
df1 = pd.DataFrame(data = {'col1' : [1/12/2020, 2/12/2020, 3/12/2020,], 
                           'col2' : [10, 11, 12], 'col3' :[13, 14, 10]}) 
df2 = pd.DataFrame(data = {'col1' : [1/12/2020, 2/12/2020, 3/12/2020,], 
                           'A' : [10, 9, 12], 'B' :[4, 14, 2]}) 

df1=df1.set_index('col1')
df2=df2.set_index('col1')

concat_data1 = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])
concat_data1

              df1                df2
            col2    col3    A   B
col1                
1/12/2020   10  13  10  4
2/12/2020   11  14  9   14
3/12/2020   12  10  12  2

Need output result as: Overlapping period=
       col2 col3

A   2   0   
B   0   1

This is a way of doing it:这是一种方法:

import itertools
import pandas as pd

data1 = {
    'col1': ['1/12/2020', '2/12/2020', '3/12/2020', '4/12/2020'], 
    'col2': [10, 11, 12, 14], 
    'col3': [13, 14, 10, 6],
    'col4': [10, 9, 15, 10], 
    'col5': [10, 9, 15, 5], 
}

data2 = {
    'col1': ['1/12/2020', '2/12/2020', '3/12/2020', '4/12/2020'], 
    'A': [10, 9, 12, 14],
    'B' :[4, 14, 2, 9],
    'C': [6, 9, 1, 3], 
    'D': [6, 9, 1, 8]
}

df1 = pd.DataFrame(data1).set_index('col1')
df2 = pd.DataFrame(data2).set_index('col1')

concat_data = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])

columns = {df: list(concat_data[df].columns) for df in set(concat_data.columns.get_level_values(0))}

matrix = pd.DataFrame(data=0, columns=columns['df1'], index=columns['df2'])

for row in concat_data.iterrows():
    for cols in list(itertools.product(columns['df1'], columns['df2'])):
        matrix.loc[cols[1], cols[0]] += row[1]['df1'][cols[0]] == row[1]['df2'][cols[1]]

print(matrix)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM