简体   繁体   English

多个数据帧之间如何关联? Python Pandas

[英]How do correlation between multiple dataframes ? Python Pandas

I have multiple df like this:我有多个这样的df:

df1:

date_time               Value(0 or 1)
2020-08-28 07:33:52     0
2020-08-28 08:08:51     0
2020-08-28 08:31:31     0
2020-08-28 08:31:59     0
2020-08-28 08:34:44     0
2020-12-24 10:10:08     1


df2:

date_time             rpm
2020-08-27 20:42:02   0.000000
2020-08-28 07:31:12   0.000000
2020-08-28 07:33:04   0.000000
2020-08-28 08:28:53   -0.001589
2020-08-28 08:29:51   -0.001589
2020-08-28 18:21:42   104.971931


df3:

date_time               Step
2020-08-28 07:33:52     1
2020-08-28 08:08:51     5
2020-08-28 08:31:59     10
2020-08-28 08:34:44     15
2020-08-28 08:36:26     20
2020-12-07 16:49:22     25

I would like to study the correlation between this dataframes, but I have a technical question, do I have to merge the dataframes and do correlation between columns?我想研究这些数据框之间的相关性,但我有一个技术问题,我是否必须合并数据框并在列之间进行相关性? or there is an other way?还是有其他方法? and how do to it?怎么办?

As you can see the seconds columns for each df are completely differents (others units).如您所见,每个 df 的秒列完全不同(其他单位)。

If you want to merge to the nearest timestamp如果要合并到最近的时间戳

df1['date_time'] = pd.to_datetime(df1['date_time'])
df2['date_time'] = pd.to_datetime(df2['date_time'])
df3['date_time'] = pd.to_datetime(df3['date_time'])

out = pd.merge_asof(df1, df2, on='date_time')
out = pd.merge_asof(out, df3, on='date_time')
out.corr()

If you want to ignore the timestamps and instead simply concatenate如果您想忽略时间戳,而是简单地连接

pd.concat([df1,df2['rpm'],df3['Step']], axis=1).corr()

These will yield different results, because merge_asof is looking at the timestamps and merging the values together on the closest timestamp from df1这些将产生不同的结果,因为 merge_asof 正在查看时间戳并将值合并到最接近 df1 的时间戳上

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM