简体   繁体   中英

Pandas Groupby Row with Multiple Columns

I've read quite a few different methods on joining and still haven't really found a solution that I can wrap my head around. Was hoping for some input or guidance.

I have a dataframe with a set of columns that looks like the following:

In [1]: df_old
Out[1]: 
CID     time_a     time_b     time_c     time_d
dc12    4:14pm     NaN        NaN        NaN
dc12    NaN        4:18pm     NaN        NaN
dc12    NaN        NaN        4:44pm     NaN
ab14    2:14pm     NaN        NaN        NaN
ab14    NaN        3:18pm     NaN        NaN
ab14    NaN        NaN        3:27pm     NaN
ab14    NaN        NaN        NaN        4:15pm

What I want would be the following:

In [2]: df_new
Out[2]: 

CID     time_a     time_b     time_c     time_d
dc12    4:14pm     4:18pm     4:44pm     NaN
ab14    2:14pm     3:18pm     3:27pm     4:15pm
...

I think there's a method of doing it with df.groupby() but I wasn't able to get any results and was wondering if anybody could point me in the right direction.

Thanks so much in advance for your help!

You could use groupby and then call .first() , which will give you the first non-nan value seen (which is why I was wondering whether there was only one):

>>> df.groupby("CID", as_index=False).first()
    CID  time_a  time_b  time_c  time_d
0  ab14  2:14pm  3:18pm  3:27pm  4:15pm
1  dc12  4:14pm  4:18pm  4:44pm     NaN
>>> df.groupby("CID", as_index=False, sort=False).first()
    CID  time_a  time_b  time_c  time_d
0  dc12  4:14pm  4:18pm  4:44pm     NaN
1  ab14  2:14pm  3:18pm  3:27pm  4:15pm

This assumes CID is a column and not an index. If it's an index, either call reset_index or use df.groupby(level=0).first() instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM