[英]How to add date older than 3 years in current date column in pandas
我在 pandas 中有關注 dataframe
code fat_version bat_version from_date to_date
102 1.7 2.5 2019-01-02 2019-04-16
102 3.5 7.1.5 2019-04-16 2020-04-16
347 6.55 6.55 2019-06-04 2020-04-16
107 6.55 6.55 2019-01-18 2019-04-05
107 6.55 6.55 2019-04-05 2020-04-16
我想要做的是將超過 3 年的日期添加到至少 from_date 並將相應fat_varsion
和bat_version
按代碼級別作為nan
分組。 我想要的 dataframe 如下
code fat_version bat_version from_date to_date
102 nan nan 2016-01-02 2019-01-01
102 1.7 2.5 2019-01-02 2019-04-16
102 3.5 7.1.5 2019-04-16 2020-04-16
347 nan nan 2016-06-04 2019-06-03
347 6.55 6.55 2019-06-04 2020-04-16
107 nan nan 2016-01-18 2019-01-17
107 6.55 6.55 2019-01-18 2019-04-05
107 6.55 6.55 2019-04-05 2020-04-16
如何在 Pandas 中執行此操作?
通過DataFrame.drop_duplicates
獲取每組的第一行,更改offsets.DateOffset
中的值,通過DataFrame.assign
添加 3 年,然后加入原始並排序:
df['from_date'] = pd.to_datetime(df['from_date'])
df['to_date'] = pd.to_datetime(df['to_date'])
df1 = (df.drop_duplicates('code')
.assign(to_date = lambda x: x['from_date'],
from_date = lambda x: x['from_date'] - pd.offsets.DateOffset(years=3),
fat_version = np.nan,
bat_version = np.nan))
print (df1)
code fat_version bat_version from_date to_date
0 102 NaN NaN 2016-01-02 2019-01-02
2 347 NaN NaN 2016-06-04 2019-06-04
3 107 NaN NaN 2016-01-18 2019-01-18
df = pd.concat([df1, df], ignore_index=True).sort_values('code')
print (df)
code fat_version bat_version from_date to_date
0 102 NaN NaN 2016-01-02 2019-01-02
3 102 1.70 2.5 2019-01-02 2019-04-16
4 102 3.50 7.1.5 2019-04-16 2020-04-16
2 107 NaN NaN 2016-01-18 2019-01-18
6 107 6.55 6.55 2019-01-18 2019-04-05
7 107 6.55 6.55 2019-04-05 2020-04-16
1 347 NaN NaN 2016-06-04 2019-06-04
5 347 6.55 6.55 2019-06-04 2020-04-16
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.