简体   繁体   English

熊猫数据框,获取每行的最近日期

[英]Pandas dataframe, get recent date in each row

I'm trying to go through each row (iterrow?) and find the most recent date (sort function?) and put it in column 'G' 我试图遍历每一行(iterrow?)并找到最新日期(sort函数?)并将其放在“ G”列中

I'm having trouble combining the iterate function and the sorting function. 我在组合迭代功能和排序功能时遇到麻烦。

    A       B           C           D           E           F           G
0   1       20171018    20171019    20171001    20171002    id_123      
1   2       NaN         20171005    20171006    20171003    id_234      
2   3       NaN         NaN         20171019    20171020    id_345      
3   4       NaN         NaN         NaN         20171021    id_456      

Desired Output 期望的输出

    A       B           C           D           E           F           G
0   1       20171018    20171019    20171001    20171002    id_123      20171019
1   2       NaN         20171005    20171006    20171003    id_234      20171006
2   3       NaN         NaN         20171019    20171020    id_345      20171020
3   4       NaN         NaN         NaN         20171021    id_456      20171021

Here is the code to generate the dataframe 这是生成数据框的代码

data2 = {'A': [1, 2, 3, 4], 
        'B': ['20171018', '', '', ''], 
        'C': ['20171019', '20171005', '', ''],
        'D': ['20171001', '20171006', '20171019', ''],
        'E': ['20171002', '20171003', '20171020', '20171021'],
        'F': ['id_123','id_234','id_345','id_456'],
        'G': ['','','','']
        }
df3 = pd.DataFrame(data2)

edit: I have already converted the date columns using datetime 编辑:我已经使用datetime转换了日期列

You can use the .max() method on the dataframe to get the most recent date. 您可以在数据框上使用.max()方法来获取最新日期。 You will need to pass the parameter axis=1 to have it calculate the max along each row. 您将需要传递参数axis=1以使其沿每一行计算最大值。

import pandas as pd

data = {'A': [1, 2, 3, 4],
        'B': ['20171018', '', '', ''],
        'C': ['20171019', '20171005', '', ''],
        'D': ['20171001', '20171006', '20171019', ''],
        'E': ['20171002', '20171003', '20171020', '20171021'],
        'F': ['id_123','id_234','id_345','id_456']
        }
df = pd.DataFrame(data)

# convert to datetimes
for c in 'BCDE':
    df[c] = pd.to_datetime(df[c])

# create a new column
df['G'] = df[['B','C','D','E']].max(axis=1)
print(df)

   A          B          C          D          E       F          G
0  1 2017-10-18 2017-10-19 2017-10-01 2017-10-02  id_123 2017-10-19
1  2        NaT 2017-10-05 2017-10-06 2017-10-03  id_234 2017-10-06
2  3        NaT        NaT 2017-10-19 2017-10-20  id_345 2017-10-20
3  4        NaT        NaT        NaT 2017-10-21  id_456 2017-10-21

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM