繁体   English   中英

合并 2 个数据框并按日期时间 Pandas Python 排序

[英]Merging 2 dataframes and sorting by datetime Pandas Python

我想生成一个代码,它为数据框data创建一个附加表。 新的数据框data2将有以下变化:

  • 标签将是New的而不是Old
  • col1的最后一个索引将被删除
  • col2的第一个索引将被删除
  • date将是第一个索引将被删除,所有日期值将减去 1 分钟

然后我想连接这两个数据框以制作一个称为merge的数据框我想按dates对数据框进行排序。 由于删除了data2的第一个索引,因此merge的顺序应该按照label的顺序: New, Old, New, Old 如何从date_mod中减去 1 分钟并按日期顺序合并两个数据框?

import pandas as pd 

d = {'col1': [4, 5, 2, 2, 3, 5, 1, 1, 6], 'col2': [6, 2, 1, 7, 3, 5, 3, 3, 9], 
     'label':['Old','Old','Old','Old','Old','Old','Old','Old','Old'],
     'date': ['2022-01-24 10:07:02', '2022-01-27 01:55:03', '2022-01-30 19:09:03', '2022-02-02 14:34:06',
              '2022-02-08 12:37:03', '2022-02-10 03:07:02', '2022-02-10 14:02:03', '2022-02-11 00:32:25',
              '2022-02-12 21:42:03']}

data = pd.DataFrame(d)

'''
Additional Dataframe
label will have New 
'col1'`s last index will be deleted
'col2'`s first index will be deleted 
'date' will be first index will be deleted and all date values will be subtracted by 1 minute
'''

a = data['col1'].drop(data['col1'].index[-1])
b = data['col2'].drop(data['col2'].index[0])
# subtract the date_mod by 1 minute 
date_mod = pd.to_datetime(data['date'][1:])


data2 = pd.DataFrame({'col1':a,'col2':b,
'label':['New','New','New','New','New','New','New','New'],
'date': date_mod})
'''
Merging data and data2 
Sort by 'date' 
Should go in order as Old, New, Old, New ...
The length of the columns are 1 less than of data bc of the dropped indexes 
'''
merge=pd.merge(data,displayer)

我认为最简单的方法, - 将所有调整放入函数中并应用于原始数据帧的副本,稍后简单地连接和排序:

data.date = pd.to_datetime(data.date) # converting column date str values to datetime to deduct 1minute later

def adjust_data(df):
    df['col1'] = df['col1'].drop(df['col1'].index[-1])
    df['col2'] = df['col2'].drop(df['col2'].index[0])
    df.date = df.date - pd.Timedelta(minutes=1)  # subtract the datetime by 1 minute
    df.label = df.label.replace('Old','New') # change values in the column "label"


data2 = data.copy()
adjust_data(data2) # apply function to data2

# concat both dataframes and sort by column "date"
merge = pd.concat([data,data2], axis=0).sort_values(by=['date']).reset_index(drop=True)

print(merge)

出去:

    col1  col2 label                date
0    4.0   NaN   New 2022-01-24 10:06:02
1    4.0   6.0   Old 2022-01-24 10:07:02
2    5.0   2.0   New 2022-01-27 01:54:03
3    5.0   2.0   Old 2022-01-27 01:55:03
4    2.0   1.0   New 2022-01-30 19:08:03
5    2.0   1.0   Old 2022-01-30 19:09:03
6    2.0   7.0   New 2022-02-02 14:33:06
7    2.0   7.0   Old 2022-02-02 14:34:06
8    3.0   3.0   New 2022-02-08 12:36:03
9    3.0   3.0   Old 2022-02-08 12:37:03
10   5.0   5.0   New 2022-02-10 03:06:02
11   5.0   5.0   Old 2022-02-10 03:07:02
12   1.0   3.0   New 2022-02-10 14:01:03
13   1.0   3.0   Old 2022-02-10 14:02:03
14   1.0   3.0   New 2022-02-11 00:31:25
15   1.0   3.0   Old 2022-02-11 00:32:25
16   NaN   9.0   New 2022-02-12 21:41:03
17   6.0   9.0   Old 2022-02-12 21:42:03

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM