[英]How to create a pandas DataFrame column based on two other columns that holds dates?
I have a pandas Dataframe with two date columns (A and B) and I would like to create a 3rd column (C) that holds dates created using month and year from column A and the day of column B. Obviously I would need to change the day for the months that day doesn't exist like we try to create 31st Feb 2020, it would need to change it to 29th Feb 2020.我有一个带有两个日期列(A 和 B)的熊猫数据框,我想创建一个第三列(C),其中包含使用 A 列中的月份和年份以及 B 列的日期创建的日期。显然我需要更改那天的月份并不存在,就像我们尝试创建 2020 年 2 月 31 日一样,它需要将其更改为 2020 年 2 月 29 日。
For example例如
import pandas as pd
df = pd.DataFrame({'A': ['2020-02-21', '2020-03-21', '2020-03-21'],
'B': ['2020-01-31', '2020-02-11', '2020-02-01']})
for c in df.columns:
dfx[c] = pd.to_datetime(dfx[c])
Then I want to create a new column C that is a new datetime that is:然后我想创建一个新的 C 列,它是一个新的日期时间,它是:
year = df.A.dt.year年 = df.A.dt.year
month = df.A.dt.month月 = df.A.dt.month
day = df.B.dt.day天 = df.B.dt.day
I don't know how to create this column.我不知道如何创建此列。 Can you please help?你能帮忙吗?
Here is one way to do it, using pandas' time series functionality:这是使用熊猫的时间序列功能的一种方法:
import pandas as pd
# your example data
df = pd.DataFrame({'A': ['2020-02-21', '2020-03-21', '2020-03-21'],
'B': ['2020-01-31', '2020-02-11', '2020-02-01']})
for c in df.columns:
# keep using the same dataframe here
df[c] = pd.to_datetime(df[c])
# set back every date from A to the end of the previous month,
# then add the number of days from the date in B
df['C'] = df.A - pd.offsets.MonthEnd() + pd.TimedeltaIndex(df.B.dt.day, unit='D')
display(df)
Result:结果:
A B C
0 2020-02-21 2020-01-31 2020-03-02
1 2020-03-21 2020-02-11 2020-03-11
2 2020-03-21 2020-02-01 2020-03-01
As you can see in row 0, this handles the case of "February 31st" not quite as you suggested, but still in a logical way.正如您在第 0 行中看到的那样,这处理“2 月 31 日”的情况并不像您建议的那样,但仍以合乎逻辑的方式处理。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.