I'm very new to using Python and I've been googling around, but nothing seems to exactly fit my problem.
I have a dataset like the following:
groupID sentenceID strings
A 0 'abc'
A 0 'def'
A 1 'ghi'
B 0 'abc'
B 1 'def'
B 2 'ghi'
and I'd like the output to look like:
groupID sentenceID strings
A 0 'abc. def'
A 1 'ghi'
B 0 'abc'
B 1 'def'
B 2 'ghi'
Written out in plain English, what I'm trying to accomplish is as follows:
For unique group in groupID:
if sentenceID is duplicate, then concatenate strings
if sentenceID is not duplicate, then print string
I'm sure that it's easy to do with pandas, but I'm having trouble getting it right. Can any one help? Thank you in advance.
You can use groupby
and a custom aggregating function to do it
df.groupby(['groupID','sentenceID']).\
aggregate({'strings': (lambda x: '. '.join(x))}).\
reset_index()
Another way with groupby()
and apply()
df.groupby(['groupID','sentenceID'])['strings'].apply(lambda x: ', '.join(x)).reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.