如何找到多索引 dataframe 的两个键之间的重叠行数？

Question

Two dataframes have been concatenated with different keys (multiindex dataframe) with same index.两个数据帧已与具有相同索引的不同键（多索引数据帧）连接。 Dates are the index.日期是索引。 There are different products in each dataframe as column names and their prices.每个 dataframe 中有不同的产品作为列名称及其价格。 I basically had to find the correlation between these two dataframes and overlapping period count.我基本上必须找到这两个数据帧和重叠周期数之间的相关性。 Correlation is done but how to find the count of overlapping rows with each product from each dataframe and produce result as a dataframe with products from dataframe 1 as column name and products from dataframe2 as row names and values as the number of overlapping rows for the same period.相关性已完成，但如何找到每个 dataframe 中每个产品的重叠行数，并生成 dataframe 的结果，其中 dataframe 1 中的产品作为列名，dataframe2 中的产品作为行名，值作为相同的重叠行数时期。 It should be a matrix.它应该是一个矩阵。


For example: Dataframe1:
df1 = pd.DataFrame(data = {'col1' : [1/12/2020, 2/12/2020, 3/12/2020,], 
                           'col2' : [10, 11, 12], 'col3' :[13, 14, 10]}) 
df2 = pd.DataFrame(data = {'col1' : [1/12/2020, 2/12/2020, 3/12/2020,], 
                           'A' : [10, 9, 12], 'B' :[4, 14, 2]}) 

df1=df1.set_index('col1')
df2=df2.set_index('col1')

concat_data1 = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])
concat_data1

              df1                df2
            col2    col3    A   B
col1                
1/12/2020   10  13  10  4
2/12/2020   11  14  9   14
3/12/2020   12  10  12  2

Need output result as: Overlapping period=
       col2 col3

A   2   0   
B   0   1

Answer 1

This is a way of doing it:这是一种方法：

import itertools
import pandas as pd

data1 = {
    'col1': ['1/12/2020', '2/12/2020', '3/12/2020', '4/12/2020'], 
    'col2': [10, 11, 12, 14], 
    'col3': [13, 14, 10, 6],
    'col4': [10, 9, 15, 10], 
    'col5': [10, 9, 15, 5], 
}

data2 = {
    'col1': ['1/12/2020', '2/12/2020', '3/12/2020', '4/12/2020'], 
    'A': [10, 9, 12, 14],
    'B' :[4, 14, 2, 9],
    'C': [6, 9, 1, 3], 
    'D': [6, 9, 1, 8]
}

df1 = pd.DataFrame(data1).set_index('col1')
df2 = pd.DataFrame(data2).set_index('col1')

concat_data = pd.concat([df1, df2], axis=1, keys=['df1', 'df2'])

columns = {df: list(concat_data[df].columns) for df in set(concat_data.columns.get_level_values(0))}

matrix = pd.DataFrame(data=0, columns=columns['df1'], index=columns['df2'])

for row in concat_data.iterrows():
    for cols in list(itertools.product(columns['df1'], columns['df2'])):
        matrix.loc[cols[1], cols[0]] += row[1]['df1'][cols[0]] == row[1]['df2'][cols[1]]

print(matrix)

如何找到多索引 dataframe 的两个键之间的重叠行数？

问题描述

1 个解决方案

解决方案1
0 已采纳 2023-01-27 00:52:43

如何找到多索引 dataframe 的两个键之间的重叠行数？

问题描述

1 个解决方案

解决方案1 0 已采纳 2023-01-27 00:52:43

解决方案1
0 已采纳 2023-01-27 00:52:43