简体   繁体   中英

Pandas merge several rows with different columns into one row

I have dataframe df with following characteristic

store_id city_id sales_A sales_B sales_C
STORE01 CITY99 100 Item None None
STORE01 CITY99 None 200 Order None
STORE01 CITY99 None None 300 Client
STORE01 CITY99 150 Order None 300 Client
...

All rows will has same characteristics, where same store id and city ID has 1 row or more:

  • row 1: sales A has value, other None
  • row 2: sales B has value, other None
  • row 3: sales C has value, other None
  • row 4: sales A has value (but different with row 1), other None

Note that the value is not number, they are string, and must be kept as string

Ordering of rows might be different, but basically each has 1 or more rows, depends on sales.

In pandas,how can I merge them into one row, so the result dataset will be something like this:

store_id city_id sales_A sales_B sales_C
STORE01 CITY99 100 Item, 150 Order 200 Order 300 Client

Thanks

Use custom lambda function with remove None values and duplicates, last join values by , in GroupBy.agg :

#if None are strings convert them to NoneType
#df = df.mask(df == 'None', None)


f = lambda x: ', '.join(x.dropna().unique())
df = df.groupby(['store_id','city_id'], as_index=False).agg(f)
print (df)
  store_id city_id              sales_A    sales_B     sales_C
0  STORE01  CITY99  100 Item, 150 Order  200 Order  300 Client

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM