[英]Aggregate pandas dataframe with string entries
I have a dataframe of the following form 我有以下形式的数据框
df = pd.DataFrame({'Start':['47q2',None, None,'49q1',None,None],
'Threshold':[None, '47q3', None,None, '49q2', None],
'End':[None, None, '48q1',None, None, '50q2'],
'Series':['S1','S1','S1','S2','S2','S2']})
End Series Start Threshold
0 None S1 47q2 None
1 None S1 None 47q3
2 48q1 S1 None None
3 None S2 49q1 None
4 None S2 None 49q2
5 50q2 S2 None None
I want to reshape the dataframe so that I have the information 我想重塑数据框,以便获得信息
df_wanted = pd.DataFrame({'Start':['47q2','49q1'],
'Threshold':['47q3','49q2'],
'End':['48q1','50q2'],
'Series':['S1','S2']})
End Series Start Threshold
0 48q1 S1 47q2 47q3
1 50q2 S2 49q1 49q2
That is, I'd like each Series to take up just one row, and have the information about start, end and threshold in the other columns. 也就是说,我希望每个系列仅占用一行,而在其他列中提供有关开始,结束和阈值的信息。
I tried using groupby and agg - however as they are strings I couldn't get this working. 我尝试使用groupby和agg-但是由于它们是字符串,因此无法正常工作。 I'm unsure what sort of function could achieve this.
我不确定哪种功能可以实现此目的。
I am unsure if it makes any difference, this dataframe is contructed from another, which has None entries - however this dataframe is showing as NaN (but I don't know how to reproduce that as an example). 我不确定是否有任何区别,此数据帧是由另一个没有任何条目的结构构成的-但是,此数据帧显示为NaN(但我不知道如何重现该示例)。
Option 1 选项1
Use groupby
+ first
. first
使用groupby
+。
df.groupby('Series', as_index=False).first()
Series End Start Threshold
0 S1 48q1 47q2 47q3
1 S2 50q2 49q1 49q2
Option 2 选项2
A slower solution using groupby
+ apply
. 使用
groupby
+ apply
较慢解决方案。
df.groupby('Series').apply(lambda x: x.bfill().ffill()).drop_duplicates()
End Series Start Threshold
0 48q1 S1 47q2 47q3
3 50q2 S2 49q1 49q2
The apply logic fills holes, and the final drop_duplicates
call drops redundant rows. 应用逻辑填补了
drop_duplicates
,最后的drop_duplicates
调用删除了多余的行。
set_index
+ stack
set_index
+ stack
df.set_index('Series').stack().unstack().reset_index()
Out[790]:
Series End Start Threshold
0 S1 48q1 47q2 47q3
1 S2 50q2 49q1 49q2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.