[英]Pandas dataframe, get recent date in each row
我試圖遍歷每一行(iterrow?)並找到最新日期(sort函數?)並將其放在“ G”列中
我在組合迭代功能和排序功能時遇到麻煩。
A B C D E F G
0 1 20171018 20171019 20171001 20171002 id_123
1 2 NaN 20171005 20171006 20171003 id_234
2 3 NaN NaN 20171019 20171020 id_345
3 4 NaN NaN NaN 20171021 id_456
期望的輸出
A B C D E F G
0 1 20171018 20171019 20171001 20171002 id_123 20171019
1 2 NaN 20171005 20171006 20171003 id_234 20171006
2 3 NaN NaN 20171019 20171020 id_345 20171020
3 4 NaN NaN NaN 20171021 id_456 20171021
這是生成數據框的代碼
data2 = {'A': [1, 2, 3, 4],
'B': ['20171018', '', '', ''],
'C': ['20171019', '20171005', '', ''],
'D': ['20171001', '20171006', '20171019', ''],
'E': ['20171002', '20171003', '20171020', '20171021'],
'F': ['id_123','id_234','id_345','id_456'],
'G': ['','','','']
}
df3 = pd.DataFrame(data2)
編輯:我已經使用datetime轉換了日期列
您可以在數據框上使用.max()
方法來獲取最新日期。 您將需要傳遞參數axis=1
以使其沿每一行計算最大值。
import pandas as pd
data = {'A': [1, 2, 3, 4],
'B': ['20171018', '', '', ''],
'C': ['20171019', '20171005', '', ''],
'D': ['20171001', '20171006', '20171019', ''],
'E': ['20171002', '20171003', '20171020', '20171021'],
'F': ['id_123','id_234','id_345','id_456']
}
df = pd.DataFrame(data)
# convert to datetimes
for c in 'BCDE':
df[c] = pd.to_datetime(df[c])
# create a new column
df['G'] = df[['B','C','D','E']].max(axis=1)
print(df)
A B C D E F G
0 1 2017-10-18 2017-10-19 2017-10-01 2017-10-02 id_123 2017-10-19
1 2 NaT 2017-10-05 2017-10-06 2017-10-03 id_234 2017-10-06
2 3 NaT NaT 2017-10-19 2017-10-20 id_345 2017-10-20
3 4 NaT NaT NaT 2017-10-21 id_456 2017-10-21
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.