繁体   English   中英

根据最接近的索引匹配行

[英]Match rows based on closest index

我有两个不同的数据框:

数据框1:

data = {'Server Name': ['PhysicalWindows1', 'PhysicalWindows2', 'PhysicalLinux1', 'PhysicalLinux2'],
        'Chips1': [1, 1, 2, 2], 
        'pCpu Cores': [8, 8, 32, 32],
        'Cpu Clock': [3400, 3400, 2600, 2600]}
  
# Create DataFrame
df = pd.DataFrame(data)

Dataframe 2:

data = {'Chips': [1, 1, 1, 2, 2],
        'Cores': [8, 8, 8, 11, 11],
        'Clock Speed': [3300, 3500, 2900, 900, 100], 
        'Avg Watts Idle': [58.5, 63, 25, 83.8, 65]
}

data = pd.DataFrame(data)

现在,我正在尝试根据两个索引(芯片和核心)之间的直接匹配以及与时钟速度的最接近匹配来匹配这两个数据帧,以获取“平均瓦特空闲”列。 基本上,dataframe1 中的第一行有 ['PhysicalWindows1', 1, 8, 3400] 并且它与 dataframe2 中的三个不同行匹配:[1,8, 3300, 58.5],[1,8,3500, 63] 和[1,8,2900, 25] 因此我只想对前两个而不是第三个进行平均。 我的 dataframe 最好看起来像:

'Server Name': ['PhysicalWindows1', 'PhysicalWindows2', 'PhysicalLinux1', 'PhysicalLinux2'],
'Chips1': [1, 1, 2, 2], 
'pCpu Cores': [8, 8, 32, 32],
'Cpu Clock': [3400, 3400, 2600, 2600]
'Avg Watts Idle' : [(58.5+63/2), (58.5+63/2), N/A, N/A]

import pandas as pd
data = {'Chips': [1, 1, 1, 2, 2],
        'Cores': [8, 8, 8, 11, 11],
        'Clock Speed': [3300, 3500, 2900, 900, 100], 
        'Avg Watts Idle': [58.5, 63, 25, 83.8, 65]
}

df2 = pd.DataFrame(data)

data = {'Server Name': ['PhysicalWindows1', 'PhysicalWindows2', 'PhysicalLinux1', 'PhysicalLinux2'],
        'Chips1': [1, 1, 2, 2], 
        'pCpu Cores': [8, 8, 32, 32],
        'Cpu Clock': ['3400 ', '3400', '2600', '2600']}
  
# Create DataFrame
df1 = pd.DataFrame(data)


merged = df1.merge(df2, left_on=['Chips1','pCpu Cores'], right_on=['Chips','Cores'], how='left')


merged['Clock Speed'] = merged['Cpu Clock'].astype(int)
merged['diff'] = abs(merged['Clock Speed'] - merged['Clock Speed'].astype(int))
# Take average of closest 2
merged['Avg Watts Idle'] = merged.groupby(['Chips1', 'pCpu Cores'])['Avg Watts Idle'].transform(lambda x: (x.loc[merged.loc[x.index, 'diff'].nsmallest(2).index].sum())/2)

# Drop the extra columns and keep only required columns
merged = merged[['Server Name', 'Chips1', 'pCpu Cores', 'Cpu Clock', 'Avg Watts Idle']]
# Drop duplicates in rows to get your output in the Question
merged.drop_duplicates()

您可以简单地将两个数据帧与您想要的匹配索引合并,然后找到最接近时钟速度的匹配项并删除额外的列以获得结果。

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM