简体   繁体   中英

Aggregate pandas dataframe with string entries

I have a dataframe of the following form

df = pd.DataFrame({'Start':['47q2',None, None,'49q1',None,None],
              'Threshold':[None, '47q3', None,None, '49q2', None],
              'End':[None, None, '48q1',None, None, '50q2'],
              'Series':['S1','S1','S1','S2','S2','S2']})

    End Series Start Threshold
0  None     S1  47q2      None
1  None     S1  None      47q3
2  48q1     S1  None      None
3  None     S2  49q1      None
4  None     S2  None      49q2
5  50q2     S2  None      None

I want to reshape the dataframe so that I have the information

df_wanted = pd.DataFrame({'Start':['47q2','49q1'],
              'Threshold':['47q3','49q2'],
              'End':['48q1','50q2'],
              'Series':['S1','S2']})

    End Series Start Threshold
0  48q1     S1  47q2      47q3
1  50q2     S2  49q1      49q2

That is, I'd like each Series to take up just one row, and have the information about start, end and threshold in the other columns.

I tried using groupby and agg - however as they are strings I couldn't get this working. I'm unsure what sort of function could achieve this.

I am unsure if it makes any difference, this dataframe is contructed from another, which has None entries - however this dataframe is showing as NaN (but I don't know how to reproduce that as an example).

Option 1
Use groupby + first .

df.groupby('Series', as_index=False).first()

  Series   End Start Threshold
0     S1  48q1  47q2      47q3
1     S2  50q2  49q1      49q2

Option 2
A slower solution using groupby + apply .

df.groupby('Series').apply(lambda x: x.bfill().ffill()).drop_duplicates()

    End Series Start Threshold
0  48q1     S1  47q2      47q3
3  50q2     S2  49q1      49q2

The apply logic fills holes, and the final drop_duplicates call drops redundant rows.

set_index + stack

df.set_index('Series').stack().unstack().reset_index()
Out[790]: 
  Series   End Start Threshold
0     S1  48q1  47q2      47q3
1     S2  50q2  49q1      49q2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM