[英]Create new column based on conditions of other columns
我有一个像下面这样的df
Year IndexDate WorkDate ID Name
0 2019 NaT 2018-12-12 9265299 FV
1 2019 2019-01-09 2019-01-09 9265299 OM
2 2020 2020-11-27 2020-11-27 9962241 PM
3 2020 NaT 2020-11-27 9962241 Other
4 2020 NaT 2021-01-19 9962241 Other
df.dtypes
Out[50]:
Year int64
IndexDate datetime64[ns]
WorkDate datetime64[ns]
ID int64
Name object
dtype: object
df.to_dict()
{'Year': {0: 2018, 1: 2019, 2: 2020, 3: 2020, 4: 2021}, 'IndexDate': {0: NaT, 1: Timestamp('2019-01-09 00:00:00'), 2: Timestamp('2020-11-27 00:00:00'), 3: NaT, 4: NaT}, 'WorkDate': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2019-01-09 00:00:00'), 2: Timestamp('2020-11-27 00:00:00'), 3: Timestamp('2020-11-27 00:00:00'), 4: Timestamp('2021-01-19 00:00:00')}, 'ID': {0: 9265299, 1: 9265299, 2: 9962241, 3: 9962241, 4: 9962241}, 'Name': {0: 'FV', 1: 'OM', 2: 'PM', 3: 'Other', 4: 'Other'}}
每个 ID 都有一个 IndexDate。 我想创建新的 Year 列,如果 Name = OM 或 PM,如果 Name = FV 或其他,则新的 Year 列将保留 Year 值,新的 Year 列将获取 IndexDate 的年份而不是 WorkDate 的年份
我的预期结果
Year IndexDate WorkDate ID Name
0 2019 NaT 2018-12-12 9265299 FV
1 2019 2019-01-09 2019-01-09 9265299 OM
2 2020 2020-11-27 2020-11-27 9962241 PM
3 2020 NaT 2020-11-27 9962241 Other
4 2020 NaT 2021-01-19 9962241 Other
非常感谢您的任何建议!!
如果IndexDate
仅适用于Name
中具有OM/PM
的行,则生成year
s 并聚合每个ID
的第一个非缺失值:
df['Year'] = df['IndexDate'].dt.year.groupby(df['ID']).transform('first')
对于一般解决方案,添加Series.where
为不匹配的OM/PM
值设置缺失值:
df['Year'] = (df['IndexDate'].dt.year.where(df['Name'].isin(['OM','PM']))
.groupby(df['ID']).transform('first'))
IIUC 需要按条件分配年份:
df['Year'] = np.where(df['Name'].isin(['OM','PM']),
df['IndexDate'].dt.year, df['WorkDate'].dt.year)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.