[英]Group by and aggregate the values in pandas dataframe
我在 python 中關注 dataframe
meddra_id meddra_label soc cross_ref soc_term
2 10000081 Abdominal pain 10017947 http://snomed.info/id/21522001 Gastrointestinal disorders
3 10017999 Gastrointestinal pain 10017947 http://snomed.info/id/21522001 Gastrointestinal disorders
15 10000340 Abstains from alcohol 10041244 http://snomed.info/id/105542008 Social circumstances
35 10001022 Acute psychosis 10037175 http://snomed.info/id/69322001 Psychiatric disorders
36 10061920 Psychotic disorder 10037175 http://snomed.info/id/69322001 Psychiatric disorders
我想使用按另一列“cross_ref”的分組來聚合“meddra_id、meddra_label、soc 和 soc_term”列中的值(並排除存在與“cross_ref”關聯的單個“meddra_id”的行)。
預期的 output 是:
meddra_id meddra_label soc cross_ref soc_term
10000081,10017999 Abdominal pain,Gastrointestinal pain 10017947 http://snomed.info/id/21522001 Gastrointestinal disorders
10001022,10061920 Acute psychosis,Psychotic disorder 10037175 http://snomed.info/id/69322001 Psychiatric disorders
我正在嘗試以下代碼行。
df_terms = df.groupby('cross_ref').filter(lambda g: len(g) > 1).drop_duplicates(subset=['meddra_id', 'meddra_label', 'soc', 'soc_term'], keep="first")
#aggregate the values
df_terms = df_terms.groupby('cross_ref')['meddra_id', 'meddra_label', 'soc', 'soc_term'].agg(' , '.join).reset_index()
當我嘗試聚合該值時,“soc_term”列未顯示在新的 dataframe (df_terms) 中
非常感謝任何幫助。
使用agg
連接不同列中的值:
df_grouped = df.groupby('cross_ref') #group as you did
df_filtered = df_grouped.filter(lambda g: len(g['meddra_id'].unique()) > 1) # filter it for single values
df_aggregated = df_filtered.groupby('cross_ref').agg({
'meddra_id': ', '.join,
'meddra_label': ', '.join,
'soc': lambda x: ', '.join(map(str, x)), # convert float values to strings
'soc_term': lambda x: ', '.join(map(str, x)) # convert float values to strings
}).reset_index() #aggregate to join values in the different columns via a comma
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.