From the dataframe which row contains one specific product,
data = [['Alpha', '#10','Apple','2020-10-01',4],
['Alpha', '#10','Tomatoes','2020-10-15',1.5],
['Beta', '#12','Banana', '2019-03-06', 2],
['Beta', '#14','Dragonfruit', '2020-04-05', 3],
['Charlie', '#16','Watermelon', '2019-01-02', 5]]
df = pd.DataFrame(data, columns = ['customer_name', 'order_number','product_variant','date','net_sales'])
I want to merge the rows so that one row contains one order number. Expected df
data_expected = [['Alpha', '#10',np.NaN,'Apple','Tomatoes','2020-10-01','2020-10-15',5.5],
['Beta', '#12','#14','Banana','Dragonfruit','2019-03-06','2020-04-05',5],
['Charlie', '#16',np.NaN,'Watermelon',np.NaN,'2019-01-02',np.NaN,5]]
df_expected = pd.DataFrame(data_expected, columns = ['customer_name','order_number_1', 'order_number_2','product_variant_1','product_variant_2','date_1','date_2','net_sales'])
In the real dataframe, one customer may have more than 2 products within the same order number, and may have more than 2 order numbers, and more than 2 dates as well (as in the real world).
cc
column that takes the cumulative count.groupby
to calculate the sum of net sales, which you will add to the dataframe later.pivot
the dataframe and and rename the multi-index column as one column joining together with _
. #pivot has a major bug in previous versions. You can upgrade with pip install pandas --upgradenet_sales
column by setting to s
-- the series you created earlier, prior to manipulating the shape of the dataframe.df['cc'] = (df.groupby('customer_name').cumcount() + 1).astype(str)
s = df.groupby('customer_name')['net_sales'].sum()
df = df.pivot(index=['customer_name'], columns='cc', values=['order_number','product_variant','date'])
df.columns = ['_'.join(col) for col in df.columns]
df['net_sales'] = s
df
Out[1]:
order_number_1 order_number_2 product_variant_1 \
customer_name
Alpha #10 #10 Apple
Beta #12 #14 Banana
Charlie #16 NaN Watermelon
product_variant_2 date_1 date_2 net_sales
customer_name
Alpha Tomatoes 2020-10-01 2020-10-15 5.5
Beta Dragonfruit 2019-03-06 2020-04-05 5.0
Charlie NaN 2019-01-02 NaN 5.0
Appreciate an excellent accepted answer exists, but here is my 'one-liner'
df2 = df.groupby('customer_name').apply(lambda x:pd.DataFrame(x.reset_index().unstack()).transpose())
df2
gives you this
| | ('customer_name', 0) | ('customer_name', 1) | ('date', 0) | ('date', 1) | ('index', 0) | ('index', 1) | ('net_sales', 0) | ('net_sales', 1) | ('order_number', 0) | ('order_number', 1) | ('product_variant', 0) | ('product_variant', 1) |
|:---------------|:-----------------------|:-----------------------|:--------------|:--------------|---------------:|---------------:|-------------------:|-------------------:|:----------------------|:----------------------|:-------------------------|:-------------------------|
| ('Alpha', 0) | Alpha | Alpha | 2020-10-01 | 2020-10-15 | 0 | 1 | 4 | 1.5 | #10 | #10 | Apple | Tomatoes |
| ('Beta', 0) | Beta | Beta | 2019-03-06 | 2020-04-05 | 2 | 3 | 2 | 3 | #12 | #14 | Banana | Dragonfruit |
| ('Charlie', 0) | Charlie | nan | 2019-01-02 | nan | 4 | nan | 5 | nan | #16 | nan | Watermelon | nan |
which is almost as required except for some aggregation and cleanup, along the lines of
del df2['customer_name']
del df2['index']
df2['net_sales_total'] = df2['net_sales'].sum(axis=1)
del df2['net_sales']
df2.columns = [c[0] + '_' + str(c[1]) for c in df2.columns]
df2.rename(columns={'net_sales_total_':'net_sales'}, inplace=True)
so we get
| | date_0 | date_1 | order_number_0 | order_number_1 | product_variant_0 | product_variant_1 | net_sales |
|:---------------|:-----------|:-----------|:-----------------|:-----------------|:--------------------|:--------------------|------------:|
| ('Alpha', 0) | 2020-10-01 | 2020-10-15 | #10 | #10 | Apple | Tomatoes | 5.5 |
| ('Beta', 0) | 2019-03-06 | 2020-04-05 | #12 | #14 | Banana | Dragonfruit | 5 |
| ('Charlie', 0) | 2019-01-02 | nan | #16 | nan | Watermelon | nan | 5 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.