If I have a dataframe like this with different product page for each user id and I want to group all product page of a user together separated by hyphen
and I want the end result like below
Is it easier to do this in pandas or sql? My dataset is currently 7.5MM rows and it would grow to ten of millions when used for more data.
In pandas can we use series.str.concatenate method to collapse and join by hyphen?
In sql suggestions?
In pandas, you can use a groupby
with an anonymous function:
>>> df = pd.DataFrame([(5, 'product'), (5, 'product'), (5, 'home'), (4, 'product'), (4, 'home')], columns=['user_id', 'page_category'])
>>> df
user_id page_category
0 5 product
1 5 product
2 5 home
3 4 product
4 4 home
>>> df.groupby('user_id')['page_category'].apply(lambda x: '-'.join(x))
user_id
4 product-home
5 product-product-home
Name: page_category, dtype: object
If by "easier", you mean "faster", keep in mind that SQL is a database interface, not the database itself. How quickly this operation can be performed in the database depends on its architecture.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.