简体   繁体   中英

How to include the counts for each character while removing the duplicates using itertools.groupby

I have the following code:

df= pd.DataFrame(data=all_r_1.to_dataframe().groupby(['user_id'])['type'].sum()).reset_index()

userid | type
20     | aab
21     | ababb

To remove the duplicates from the strings in the type column, I have this code:

df['type'] = df['type'].apply(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))

which produces this:

userid | type
20     | ab
21     | abab

This is the input df:

id | userid | type 
1  | 20     | a  
2  | 20     | a
3  | 20     | b
4  | 21     | a  
5  | 21     | b
6  | 21     | a
7  | 21     | b
8  | 21     | b

However, what I want to do is to include the counts for each character while removing the duplicates:

userid | type
20     | a2b
21     | abab2

Any ideas how I can modify the itertools.groupby code to also include the counts?

itertools.groupby存储实际的组,因此您可以按以下方式访问它:

df['type'] = df['type'].apply(lambda x: ''.join('{}{}'.format(ch,len(list(group))) for ch, group in itertools.groupby(x)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM