简体   繁体   English

计算具有不同维度的两个时间序列数据帧的差异列

[英]Calculating the difference column wise of two time series dataframe with different dimensions

I have two DataFrames df1 (mxn) and df2 (mx1) as time series and I want to calculate the difference for each column between df1 and df2 which would look like df3 .我有两个数据帧df1 (mxn) 和df2 (mx1) 作为时间序列,我想计算df1df2之间每一列的差异,看起来像df3

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 0.4, 0.2, -0.4], 
    '02K W':[3.5, 3.2, 'nan', 'nan'], 
    '03K W':[-1, -2.3, 0.3, 2.4], 
    '04K W':[1.5, 2.6, 3.2, 4.2]})

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'K W':[1, 1.5, 1.2, 0.8]})

df3 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[0.2, 1.1, 1, 1.2], 
    '02K W':[2.5, 1.7, 'nan', 'nan'], 
    '03K W':[2, 3.8, 0.9, 1.6], 
    '04K W':[0.5, 1.1, 2, 3.4]})

Is there an easy way to build a difference column wise?有没有一种简单的方法可以明智地构建差异列?

You can set Date as index, and use .sub method:您可以将Date设置为索引,并使用.sub方法:

df1.set_index('Date').sub(df2.set_index('Date')['K W'], axis='rows')

Output:输出:

            01K W  02K W  03K W  04K W
Date                                  
2021-01-01    0.2    2.5   -2.0    0.5
2021-01-02   -1.1    1.7   -3.8    1.1
2021-01-03   -1.0    NaN   -0.9    2.0
2021-01-04   -1.2    NaN    1.6    3.4

Note : you might want to add astype(float) after set_index('Date') to correct your data type.注意:您可能希望在set_index('Date')之后添加astype(float) set_index('Date')以更正您的数据类型。

First you will need to use numeric values, not strings.首先,您需要使用数值,而不是字符串。

It looks like your 'Date' field represents your index.看起来您的“日期”字段代表您的索引。 Pandas series can be added/subtracted element-wise based on their shared index so worth calling those out as an index. Pandas 系列可以根据它们的共享索引按元素添加/减去,因此值得将它们称为索引。 Then you can simply iterate through your df1 columns to apply df2 to each.然后您可以简单地遍历您的 df1 列以将 df2 应用于每个列。

from numpy import nan
import pandas as pd

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 0.4, 0.2, -0.4], 
    '02K W':[3.5, 3.2, nan, nan], 
    '03K W':[-1, -2.3, 0.3, 2.4], 
    '04K W':[1.5, 2.6, 3.2, 4.2]})

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'K W':[1, 1.5, 1.2, 0.8]})

df1 = df1.set_index('Date')
df2 = df2.set_index('Date')

df3 = df1.copy()

for c in df1.columns:
    df3[c] = df1[c] - df2['K W']
    
df3

Yields:产量:

            01K W  02K W  03K W  04K W
Date                                  
2021-01-01    0.2    2.5   -2.0    0.5
2021-01-02   -1.1    1.7   -3.8    1.1
2021-01-03   -1.0    NaN   -0.9    2.0
2021-01-04   -1.2    NaN    1.6    3.4

Another way to do:另一种做法:

df4 = df1[['01K W', '02K W', '03K W', '04K W']].astype(float).subtract(df2['K W'].astype(float), axis=0).abs().join(df1['Date'])[['Date','01K W', '02K W', '03K W', '04K W']]


print(df4)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 获得两个具有不同时间序列的数据帧之间的差异 - Get the difference betwen two dataframe with different time series 计算每个组的两个不同行之间的时间差(以小时为单位),在一列中具有“滞后” - Calculating time difference in hours between two different rows per gorup with a 'lag' in one column 计算python中的逐行时间差 - Calculating row-wise time difference in python 计算不同时间序列的相关性 - Calculating correlation of different time series 如何将两个系列附加到数据框行明智 - How to append two series into a dataframe Row wise 计算 Python 中两种不同日期格式之间的时间差 - Calculating time difference between two different date formats in Python 根据Time Column中两个值之间的差异,将Dataframe中的每一行重复N次 - Repeat each Row in a Dataframe different N times according to the difference between two value in the Time Column 两个不同分辨率的时间序列之间的最大差异 - maximum difference between two time series of different resolution Pandas 结合了两个不同长度的时间序列数据帧 - Pandas combine two different length of time series dataframe 如何使用 Python 将时间序列转换为显示时间序列每个元素计数的两列数据框 - How to transform a time series into a two-column dataframe showing the count for each element of the time series, using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM