简体   繁体   English

Pandas 数据框分组并组合多个行值

[英]Pandas dataframe groupby and combine multiple row values

I apologize if the title isn't clear, but I had difficulty phrasing the question.如果标题不清楚,我深表歉意,但我很难表达这个问题。 It's probably best if I just show what I would like to do.如果我只是展示我想做的事情,那可能是最好的。

Some context: I parsed a document for names and stored each name with the page number where it appears.一些上下文:我解析了一个文档的名称,并将每个名称与它出现的页码一起存储。 I need to transform the DataFrame so that there is a single row for each name the page number column combines all the pages where the name appears.我需要转换 DataFrame 以便每个名称都有一行,页码列组合了名称出现的所有页面。 I figured that this would require GroupBy, but I'm not entirely sure.我认为这需要 GroupBy,但我不完全确定。

My data currently:我目前的数据:

data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]])

pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])

  FIRST_NM LAST_NM PAGE_NUM
0     John   Smith        1
1     John   Smith        7
2     Eric   Adams        9
3     Jane     Doe       14
4     Jane     Doe       16
5     John   Smith       19

Desired dataframe:所需的数据帧:

  FIRST_NM LAST_NM PAGE_NUM
0     John   Smith   1,7,19
1     Eric   Adams        9
2     Jane     Doe    14,16

You can do this with groupby and apply:您可以使用 groupby 执行此操作并应用:

df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))
Out[23]: 
FIRST_NM  LAST_NM
Eric      Adams           9
Jane      Doe         14,16
John      Smith      1,7,19
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM