I have the following code:
df= pd.DataFrame(data=all_r_1.to_dataframe().groupby(['user_id'])['type'].sum()).reset_index()
userid | type
20 | aab
21 | ababb
To remove the duplicates from the strings in the type
column, I have this code:
df['type'] = df['type'].apply(lambda x: ''.join(ch for ch, _ in itertools.groupby(x)))
which produces this:
userid | type
20 | ab
21 | abab
This is the input df:
id | userid | type
1 | 20 | a
2 | 20 | a
3 | 20 | b
4 | 21 | a
5 | 21 | b
6 | 21 | a
7 | 21 | b
8 | 21 | b
However, what I want to do is to include the counts for each character while removing the duplicates:
userid | type
20 | a2b
21 | abab2
Any ideas how I can modify the itertools.groupby
code to also include the counts?
itertools.groupby
存储实际的组,因此您可以按以下方式访问它:
df['type'] = df['type'].apply(lambda x: ''.join('{}{}'.format(ch,len(list(group))) for ch, group in itertools.groupby(x)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.