[英]Merging 2 dataframes and sorting by datetime Pandas Python
我想生成一个代码,它为数据框data
创建一个附加表。 新的数据框data2
将有以下变化:
New
的而不是Old
的col1
的最后一个索引将被删除col2
的第一个索引将被删除date
将是第一个索引将被删除,所有日期值将减去 1 分钟然后我想连接这两个数据框以制作一个称为merge
的数据框我想按dates
对数据框进行排序。 由于删除了data2
的第一个索引,因此merge
的顺序应该按照label
的顺序: New, Old, New, Old
。 如何从date_mod
中减去 1 分钟并按日期顺序合并两个数据框?
import pandas as pd
d = {'col1': [4, 5, 2, 2, 3, 5, 1, 1, 6], 'col2': [6, 2, 1, 7, 3, 5, 3, 3, 9],
'label':['Old','Old','Old','Old','Old','Old','Old','Old','Old'],
'date': ['2022-01-24 10:07:02', '2022-01-27 01:55:03', '2022-01-30 19:09:03', '2022-02-02 14:34:06',
'2022-02-08 12:37:03', '2022-02-10 03:07:02', '2022-02-10 14:02:03', '2022-02-11 00:32:25',
'2022-02-12 21:42:03']}
data = pd.DataFrame(d)
'''
Additional Dataframe
label will have New
'col1'`s last index will be deleted
'col2'`s first index will be deleted
'date' will be first index will be deleted and all date values will be subtracted by 1 minute
'''
a = data['col1'].drop(data['col1'].index[-1])
b = data['col2'].drop(data['col2'].index[0])
# subtract the date_mod by 1 minute
date_mod = pd.to_datetime(data['date'][1:])
data2 = pd.DataFrame({'col1':a,'col2':b,
'label':['New','New','New','New','New','New','New','New'],
'date': date_mod})
'''
Merging data and data2
Sort by 'date'
Should go in order as Old, New, Old, New ...
The length of the columns are 1 less than of data bc of the dropped indexes
'''
merge=pd.merge(data,displayer)
我认为最简单的方法, - 将所有调整放入函数中并应用于原始数据帧的副本,稍后简单地连接和排序:
data.date = pd.to_datetime(data.date) # converting column date str values to datetime to deduct 1minute later
def adjust_data(df):
df['col1'] = df['col1'].drop(df['col1'].index[-1])
df['col2'] = df['col2'].drop(df['col2'].index[0])
df.date = df.date - pd.Timedelta(minutes=1) # subtract the datetime by 1 minute
df.label = df.label.replace('Old','New') # change values in the column "label"
data2 = data.copy()
adjust_data(data2) # apply function to data2
# concat both dataframes and sort by column "date"
merge = pd.concat([data,data2], axis=0).sort_values(by=['date']).reset_index(drop=True)
print(merge)
出去:
col1 col2 label date
0 4.0 NaN New 2022-01-24 10:06:02
1 4.0 6.0 Old 2022-01-24 10:07:02
2 5.0 2.0 New 2022-01-27 01:54:03
3 5.0 2.0 Old 2022-01-27 01:55:03
4 2.0 1.0 New 2022-01-30 19:08:03
5 2.0 1.0 Old 2022-01-30 19:09:03
6 2.0 7.0 New 2022-02-02 14:33:06
7 2.0 7.0 Old 2022-02-02 14:34:06
8 3.0 3.0 New 2022-02-08 12:36:03
9 3.0 3.0 Old 2022-02-08 12:37:03
10 5.0 5.0 New 2022-02-10 03:06:02
11 5.0 5.0 Old 2022-02-10 03:07:02
12 1.0 3.0 New 2022-02-10 14:01:03
13 1.0 3.0 Old 2022-02-10 14:02:03
14 1.0 3.0 New 2022-02-11 00:31:25
15 1.0 3.0 Old 2022-02-11 00:32:25
16 NaN 9.0 New 2022-02-12 21:41:03
17 6.0 9.0 Old 2022-02-12 21:42:03
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.