简体   繁体   English

用 Python 和 Pandas 合并列

[英]Merging columns with Python and Pandas

I'm using Python 3.0 and Pandas to clean some data.我正在使用 Python 3.0 和 Pandas 来清理一些数据。

I've the following table:我有下表:

#   Item_ID          Date_1                    Date_2
0    1857      2020-11-05 00:00:00      2020-12-05 00:00:00
1    1569      2020-12-09 00:00:00      2021-01-07 00:00:00
2    2569      2020-12-09 00:00:00      NaN
3    6587      2020-12-09 00:00:00      2021-10-08 00:00:00
4    5236      2020-12-09 00:00:00      -

The code to create the dataframe in order to make it easy:为了简化操作,创建 dataframe 的代码:

d = {'Item_ID': [1857, 1569, 2569, 6587, 5236], 'Date_1': ['2020-11-05 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00'], 'Date_2': ['2020-12-05 00:00:00', '2021-01-07 00:00:00', 'NaN', '2021-10-08 00:00:00', '-']}
df = pd.DataFrame(data=d)

I would like to merge the columns 'Date 1' and 'Date 2' using an efficient method because I can have a really big dataframe. The result would be the following one:我想使用一种有效的方法合并“日期 1”和“日期 2”列,因为我可以得到一个非常大的 dataframe。结果如下:

#   Item_ID          Date 3        
0    1857      2020-12-05 00:00:00
1    1569      2021-01-07 00:00:00
2    2569      2020-12-09 00:00:00
3    6587      2021-10-08 00:00:00
4    5236      2020-12-09 00:00:00

The date on 'Date 1' will only be replaced by the content of 'Date 2' if the content of 'Date 2' is a date (no matter if is > or < than the date of 'Date 1').如果'Date 2'的内容是一个日期(无论是否>或<比'Date 1'的日期),'Date 1'上的日期只会被'Date 2'的内容替换。

Can this be done with a merge?这可以通过合并来完成吗?

You can use a mask with help of pandas.to_datetime to ensure that you have dates:您可以在pandas.to_datetime的帮助下使用掩码来确保您有日期:

mask = pd.to_datetime(df['Date_2'], errors='coerce').isna()
df['Date_3'] = df['Date_1'].where(mask, df['Date_2'])

output: output:

   Item_ID               Date_1               Date_2               Date_3
0     1857  2020-11-05 00:00:00  2020-12-05 00:00:00  2020-12-05 00:00:00
1     1569  2020-12-09 00:00:00  2021-01-07 00:00:00  2021-01-07 00:00:00
2     2569  2020-12-09 00:00:00                  NaN  2020-12-09 00:00:00
3     6587  2020-12-09 00:00:00  2021-10-08 00:00:00  2021-10-08 00:00:00
4     5236  2020-12-09 00:00:00                    -  2020-12-09 00:00:00

Or, if you want to have datetime type:或者,如果您想要 datetime 类型:

df['Date_3'] = pd.to_datetime(df['Date_2'], errors='coerce').fillna(df['Date_1'])

output: output:

   Item_ID               Date_1               Date_2     Date_3
0     1857  2020-11-05 00:00:00  2020-12-05 00:00:00 2020-12-05
1     1569  2020-12-09 00:00:00  2021-01-07 00:00:00 2021-01-07
2     2569  2020-12-09 00:00:00                  NaN 2020-12-09
3     6587  2020-12-09 00:00:00  2021-10-08 00:00:00 2021-10-08
4     5236  2020-12-09 00:00:00                    - 2020-12-09

Alternative output:备选 output:

keep = ['Item_ID']
df[keep].join(pd.to_datetime(df['Date_2'], errors='coerce')
                .fillna(df['Date_1'])
                .rename('Date_3')
              )

   Item_ID     Date_3
0     1857 2020-12-05
1     1569 2021-01-07
2     2569 2020-12-09
3     6587 2021-10-08
4     5236 2020-12-09

You Can do something like below你可以做类似下面的事情

from datetime import datetime
import pandas as pd

def datetimeChecker(date):
    try:
        datetime.strptime(date,"%Y-%m-%d %H:%M:%S")
        return True
    except:
        return False

d = {'Item_ID': [1857, 1569, 2569, 6587, 5236], 'Date_1': ['2020-11-05 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00', '2020-12-09 00:00:00'], 'Date_2': ['2020-12-05 00:00:00', '2021-01-07 00:00:00', 'NaN', '2021-10-08 00:00:00', '-']}
df = pd.DataFrame(data=d)

df["final_date"]=df.apply(lambda x: x['Date_2'] if datetimeChecker(x['Date_2']) else x['Date_1'],axis=1)

And the output look like below: output 如下所示:

 Item_ID               Date_1               Date_2           final_date
0     1857  2020-11-05 00:00:00  2020-12-05 00:00:00  2020-12-05 00:00:00
1     1569  2020-12-09 00:00:00  2021-01-07 00:00:00  2021-01-07 00:00:00
2     2569  2020-12-09 00:00:00                  NaN  2020-12-09 00:00:00
3     6587  2020-12-09 00:00:00  2021-10-08 00:00:00  2021-10-08 00:00:00
4     5236  2020-12-09 00:00:00                    -  2020-12-09 00:00:00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM