简体   繁体   中英

Collapsing rows into one column value in pandas dataframe

If I have a dataframe like this with different product page for each user id and I want to group all product page of a user together separated by hyphen

在此处输入图片说明

and I want the end result like below

在此处输入图片说明

Is it easier to do this in pandas or sql? My dataset is currently 7.5MM rows and it would grow to ten of millions when used for more data.

In pandas can we use series.str.concatenate method to collapse and join by hyphen?

In sql suggestions?

In pandas, you can use a groupby with an anonymous function:

>>> df = pd.DataFrame([(5, 'product'), (5, 'product'), (5, 'home'), (4, 'product'), (4, 'home')], columns=['user_id', 'page_category'])
>>> df
    user_id page_category
 0        5       product
 1        5       product
 2        5          home
 3        4       product
 4        4          home
>>> df.groupby('user_id')['page_category'].apply(lambda x: '-'.join(x))
 user_id
 4            product-home
 5    product-product-home
 Name: page_category, dtype: object

If by "easier", you mean "faster", keep in mind that SQL is a database interface, not the database itself. How quickly this operation can be performed in the database depends on its architecture.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM