繁体   English   中英

在两个 DataFrame 中查找对应的值

[英]Find corresponding values in two DataFrames

我要在两个Pandas DataFrame中找到对应的值。

输入:df1:

   server    system     directions msgTYPE    msgID      count
0     1       sys1_in       in      ADT       MSG0001      1
1     1       sys1_in       in      ADT       MSG0002      1
2     1       sys1_in       in      ADT       MSG0003      1
3     1       sys1_in       in      ADT       MSG0004      1

df2:

   server    system     directions msgTYPE    msgID      count
0     1       sys2_out       in      ADT       MSG0001      1
1     1       sys2_out       in      ADT       MSG0001      1
2     1       sys3_out       in      ADT       MSG0003      1
3     1       sys4_out       in      ADT       MSG0004      1

Output 应该是:

     system_in       system_out        count
0     sys1_in         sys2_out           2
1     sys1_in         sys3_out           1
2     sys1_in         sys4_out           1


所以我必须从两个 DF 中构建一个 df,其中的列具有输入和输出系统,并且与 mesgID 相关。

我正在使用 df.itertuples 和 df.goupby 来制作它:

model = pd.DataFrame(columns=['in', 'out', 'count'])
for item in ins.itertuples(index=True, name='Pandas'):
        selected = outs.query('msgID == "%s"' % (getattr(item, "msgID")))
        for row in selected.itertuples(index=True, name='Pandas2'):
            model = model.append({'in': getattr(item, "system"), 'out': getattr(row, "system"), 'count': 1},
                                 ignore_index=True)
result = model.groupby(['in', 'out'])['count'].sum().reset_index()



它有效,但效率极低,输入 Frames(df1, df2) 大约有 200 万行。 有人知道在 Pandas 框架中构建的更有效的方法吗?

干杯。

您可以通过首先按相应列merging数据帧然后使用GroupBynamed_aggregations来实现这一点( pandas >= 0.25.0中的新功能):

columns = [col for col in df1.columns if col != 'system']

mrg = df1.merge(df2, on=columns, suffixes=['_in', '_out'])

mrg.groupby(columns).agg(
    system_in=('system_in', 'first'),
    system_out=('system_out', 'first'),
    count=('system_in', 'size')
).reset_index(drop=True)

Output

  system_in system_out  count
0   sys1_in   sys2_out      2
1   sys1_in   sys3_out      1
2   sys1_in   sys4_out      1

如果要将列保留为信息,只需使用mergeGroupBy.count

df1.merge(df2, on=columns, suffixes=['_in', '_out'])\
   .groupby(columns, as_index=False).count()

Output

   server directions msgTYPE    msgID  count  system_in  system_out
0       1         in     ADT  MSG0001      1          2           2
1       1         in     ADT  MSG0003      1          1           1
2       1         in     ADT  MSG0004      1          1           1

Use DataFrame.merge to join both dataframe based on common columns.Then you can use DataFrame.groupby to count by groupby.count DataFrame.reindex to correctly sort the columns:

( df1.merge(df2,on=['server','msgID','directions','msgTYPE','count'],suffixes=['_in','_out'])
.groupby(['server','msgID','directions','msgTYPE']).count().reset_index(drop=True)
.reindex(columns=['system_in','system_out','count']) )

   system_in  system_out  count
0          2           2      2
1          1           1      1
2          1           1      1 

DataFrame.reset_indexdrop=False (默认值)保留列的 rest:

( df1.merge(df2,on=['server','msgID','directions','msgTYPE','count'],suffixes=['_in','_out'])
.groupby(['server','msgID','directions','msgTYPE']).count().reindex(columns=['system_in','system_out','count']) 
.reset_index() )

   server    msgID directions msgTYPE  system_in  system_out  count
0       1  MSG0001         in     ADT          2           2      2
1       1  MSG0003         in     ADT          1           1      1
2       1  MSG0004         in     ADT          1           1      1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM