[英]Group by and aggregate the values in pandas dataframe
我在 python 中关注 dataframe
meddra_id meddra_label soc cross_ref soc_term
2 10000081 Abdominal pain 10017947 http://snomed.info/id/21522001 Gastrointestinal disorders
3 10017999 Gastrointestinal pain 10017947 http://snomed.info/id/21522001 Gastrointestinal disorders
15 10000340 Abstains from alcohol 10041244 http://snomed.info/id/105542008 Social circumstances
35 10001022 Acute psychosis 10037175 http://snomed.info/id/69322001 Psychiatric disorders
36 10061920 Psychotic disorder 10037175 http://snomed.info/id/69322001 Psychiatric disorders
我想使用按另一列“cross_ref”的分组来聚合“meddra_id、meddra_label、soc 和 soc_term”列中的值(并排除存在与“cross_ref”关联的单个“meddra_id”的行)。
预期的 output 是:
meddra_id meddra_label soc cross_ref soc_term
10000081,10017999 Abdominal pain,Gastrointestinal pain 10017947 http://snomed.info/id/21522001 Gastrointestinal disorders
10001022,10061920 Acute psychosis,Psychotic disorder 10037175 http://snomed.info/id/69322001 Psychiatric disorders
我正在尝试以下代码行。
df_terms = df.groupby('cross_ref').filter(lambda g: len(g) > 1).drop_duplicates(subset=['meddra_id', 'meddra_label', 'soc', 'soc_term'], keep="first")
#aggregate the values
df_terms = df_terms.groupby('cross_ref')['meddra_id', 'meddra_label', 'soc', 'soc_term'].agg(' , '.join).reset_index()
当我尝试聚合该值时,“soc_term”列未显示在新的 dataframe (df_terms) 中
非常感谢任何帮助。
使用agg
连接不同列中的值:
df_grouped = df.groupby('cross_ref') #group as you did
df_filtered = df_grouped.filter(lambda g: len(g['meddra_id'].unique()) > 1) # filter it for single values
df_aggregated = df_filtered.groupby('cross_ref').agg({
'meddra_id': ', '.join,
'meddra_label': ', '.join,
'soc': lambda x: ', '.join(map(str, x)), # convert float values to strings
'soc_term': lambda x: ', '.join(map(str, x)) # convert float values to strings
}).reset_index() #aggregate to join values in the different columns via a comma
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.