用字符串条目聚合熊猫数据框

Question

I have a dataframe of the following form 我有以下形式的数据框

df = pd.DataFrame({'Start':['47q2',None, None,'49q1',None,None],
              'Threshold':[None, '47q3', None,None, '49q2', None],
              'End':[None, None, '48q1',None, None, '50q2'],
              'Series':['S1','S1','S1','S2','S2','S2']})

    End Series Start Threshold
0  None     S1  47q2      None
1  None     S1  None      47q3
2  48q1     S1  None      None
3  None     S2  49q1      None
4  None     S2  None      49q2
5  50q2     S2  None      None

I want to reshape the dataframe so that I have the information 我想重塑数据框，以便获得信息

df_wanted = pd.DataFrame({'Start':['47q2','49q1'],
              'Threshold':['47q3','49q2'],
              'End':['48q1','50q2'],
              'Series':['S1','S2']})

    End Series Start Threshold
0  48q1     S1  47q2      47q3
1  50q2     S2  49q1      49q2

That is, I'd like each Series to take up just one row, and have the information about start, end and threshold in the other columns. 也就是说，我希望每个系列仅占用一行，而在其他列中提供有关开始，结束和阈值的信息。

I tried using groupby and agg - however as they are strings I couldn't get this working. 我尝试使用groupby和agg-但是由于它们是字符串，因此无法正常工作。 I'm unsure what sort of function could achieve this. 我不确定哪种功能可以实现此目的。

I am unsure if it makes any difference, this dataframe is contructed from another, which has None entries - however this dataframe is showing as NaN (but I don't know how to reproduce that as an example). 我不确定是否有任何区别，此数据帧是由另一个没有任何条目的结构构成的-但是，此数据帧显示为NaN（但我不知道如何重现该示例）。

Answer 1

Option 1 选项1
Use groupby + first . first使用groupby +。

df.groupby('Series', as_index=False).first()

  Series   End Start Threshold
0     S1  48q1  47q2      47q3
1     S2  50q2  49q1      49q2

Option 2 选项2
A slower solution using groupby + apply . 使用groupby + apply较慢解决方案。

df.groupby('Series').apply(lambda x: x.bfill().ffill()).drop_duplicates()

    End Series Start Threshold
0  48q1     S1  47q2      47q3
3  50q2     S2  49q1      49q2

The apply logic fills holes, and the final drop_duplicates call drops redundant rows. 应用逻辑填补了drop_duplicates ，最后的drop_duplicates调用删除了多余的行。

Answer 2

set_index + stack set_index + stack

df.set_index('Series').stack().unstack().reset_index()
Out[790]: 
  Series   End Start Threshold
0     S1  48q1  47q2      47q3
1     S2  50q2  49q1      49q2

用字符串条目聚合熊猫数据框

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-01-19 04:56:28

解决方案2
1 2018-01-19 05:01:35

用字符串条目聚合熊猫数据框

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-01-19 04:56:28

解决方案2 1 2018-01-19 05:01:35

解决方案1
1 已采纳 2018-01-19 04:56:28

解决方案2
1 2018-01-19 05:01:35