I apologize if the title isn't clear, but I had difficulty phrasing the question. It's probably best if I just show what I would like to do.
Some context: I parsed a document for names and stored each name with the page number where it appears. I need to transform the DataFrame so that there is a single row for each name the page number column combines all the pages where the name appears. I figured that this would require GroupBy, but I'm not entirely sure.
My data currently:
data = np.array([['John', 'Smith', 1], ['John', 'Smith', 7], ['Eric', 'Adams', 9], ['Jane', 'Doe', 14], ['Jane', 'Doe', 16], ['John', 'Smith', 19]])
pd.DataFrame(data, columns=['FIRST_NM', 'LAST_NM', 'PAGE_NUM'])
FIRST_NM LAST_NM PAGE_NUM
0 John Smith 1
1 John Smith 7
2 Eric Adams 9
3 Jane Doe 14
4 Jane Doe 16
5 John Smith 19
Desired dataframe:
FIRST_NM LAST_NM PAGE_NUM
0 John Smith 1,7,19
1 Eric Adams 9
2 Jane Doe 14,16
You can do this with groupby and apply:
df.groupby(['FIRST_NM', 'LAST_NM']).apply(lambda group: ','.join(group['PAGE_NUM']))
Out[23]:
FIRST_NM LAST_NM
Eric Adams 9
Jane Doe 14,16
John Smith 1,7,19
dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.