How to include the counts for each character while removing the duplicates using itertools.groupby

Question

I have the following code:

df= pd.DataFrame(data=all_r_1.to_dataframe().groupby(['user_id'])['type'].sum()).reset_index()

userid | type
20     | aab
21     | ababb

To remove the duplicates from the strings in the type column, I have this code:

df['type'] = df['type'].apply(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))

which produces this:

userid | type
20     | ab
21     | abab

This is the input df:

id | userid | type 
1  | 20     | a  
2  | 20     | a
3  | 20     | b
4  | 21     | a  
5  | 21     | b
6  | 21     | a
7  | 21     | b
8  | 21     | b

However, what I want to do is to include the counts for each character while removing the duplicates:

userid | type
20     | a2b
21     | abab2

Any ideas how I can modify the itertools.groupby code to also include the counts?

Answer 1

itertools.groupby存储实际的组，因此您可以按以下方式访问它：

df['type'] = df['type'].apply(lambda x: ''.join('{}{}'.format(ch,len(list(group))) for ch, group in itertools.groupby(x)))

How to include the counts for each character while removing the duplicates using itertools.groupby

Question

1 answers

solution1
1 ACCPTED 2017-03-07 09:17:53

How to include the counts for each character while removing the duplicates using itertools.groupby

Question

1 answers

solution1 1 ACCPTED 2017-03-07 09:17:53

solution1
1 ACCPTED 2017-03-07 09:17:53