I've read quite a few different methods on joining and still haven't really found a solution that I can wrap my head around. Was hoping for some input or guidance.
I have a dataframe with a set of columns that looks like the following:
In [1]: df_old
Out[1]:
CID time_a time_b time_c time_d
dc12 4:14pm NaN NaN NaN
dc12 NaN 4:18pm NaN NaN
dc12 NaN NaN 4:44pm NaN
ab14 2:14pm NaN NaN NaN
ab14 NaN 3:18pm NaN NaN
ab14 NaN NaN 3:27pm NaN
ab14 NaN NaN NaN 4:15pm
What I want would be the following:
In [2]: df_new
Out[2]:
CID time_a time_b time_c time_d
dc12 4:14pm 4:18pm 4:44pm NaN
ab14 2:14pm 3:18pm 3:27pm 4:15pm
...
I think there's a method of doing it with df.groupby() but I wasn't able to get any results and was wondering if anybody could point me in the right direction.
Thanks so much in advance for your help!
You could use groupby
and then call .first()
, which will give you the first non-nan value seen (which is why I was wondering whether there was only one):
>>> df.groupby("CID", as_index=False).first()
CID time_a time_b time_c time_d
0 ab14 2:14pm 3:18pm 3:27pm 4:15pm
1 dc12 4:14pm 4:18pm 4:44pm NaN
>>> df.groupby("CID", as_index=False, sort=False).first()
CID time_a time_b time_c time_d
0 dc12 4:14pm 4:18pm 4:44pm NaN
1 ab14 2:14pm 3:18pm 3:27pm 4:15pm
This assumes CID is a column and not an index. If it's an index, either call reset_index
or use df.groupby(level=0).first()
instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.