繁体   English   中英

分组并聚合 pandas dataframe 中的值

[英]Group by and aggregate the values in pandas dataframe

我在 python 中关注 dataframe

    meddra_id   meddra_label              soc       cross_ref                       soc_term
2   10000081    Abdominal pain            10017947  http://snomed.info/id/21522001  Gastrointestinal disorders
3   10017999    Gastrointestinal pain     10017947  http://snomed.info/id/21522001  Gastrointestinal disorders
15  10000340    Abstains from alcohol     10041244  http://snomed.info/id/105542008 Social circumstances
35  10001022    Acute psychosis           10037175  http://snomed.info/id/69322001  Psychiatric disorders
36  10061920    Psychotic disorder        10037175  http://snomed.info/id/69322001  Psychiatric disorders

我想使用按另一列“cross_ref”的分组来聚合“meddra_id、meddra_label、soc 和 soc_term”列中的值(并排除存在与“cross_ref”关联的单个“meddra_id”的行)。

预期的 output 是:

meddra_id           meddra_label                           soc      cross_ref                       soc_term
10000081,10017999   Abdominal pain,Gastrointestinal pain   10017947 http://snomed.info/id/21522001  Gastrointestinal disorders
10001022,10061920   Acute psychosis,Psychotic disorder     10037175 http://snomed.info/id/69322001  Psychiatric disorders

我正在尝试以下代码行。

df_terms = df.groupby('cross_ref').filter(lambda g: len(g) > 1).drop_duplicates(subset=['meddra_id', 'meddra_label', 'soc', 'soc_term'], keep="first")

#aggregate the values
df_terms = df_terms.groupby('cross_ref')['meddra_id', 'meddra_label', 'soc', 'soc_term'].agg(' , '.join).reset_index()

当我尝试聚合该值时,“soc_term”列未显示在新的 dataframe (df_terms) 中

非常感谢任何帮助。

使用agg连接不同列中的值:

df_grouped = df.groupby('cross_ref') #group as you did
df_filtered = df_grouped.filter(lambda g: len(g['meddra_id'].unique()) > 1) # filter it for single values

df_aggregated = df_filtered.groupby('cross_ref').agg({
    'meddra_id': ', '.join,
    'meddra_label': ', '.join,
    'soc': lambda x: ', '.join(map(str, x)), # convert float values to strings
    'soc_term': lambda x: ', '.join(map(str, x)) # convert float values to strings
}).reset_index() #aggregate to join values in the different columns via a comma

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM