I have dataframe that looks like this:
My aim is to get at:
Explanation:
Code I'm using:
start_time = time.time()
df = pd.DataFrame()
for CustomerName in base_df.CustomerName.unique():
df1 = base_df[(base_df['CustomerName']== CustomerName)][['CustomerName','order_seq','Category']]
df2 = pd.DataFrame(index=pd.MultiIndex.from_product([subdf['Category'] for p, subdf in df1.groupby(['order_seq'])], names = df1.order_seq.unique())).reset_index()
df2['CustomerName'] = CustomerName
df = df.append(df2)
print("--- %s seconds ---" %(time.time() - start_time))
This takes about 10 mins to run on my dataset - Looking for a faster method.
I am working on Pandas right now, but pointers for R or SQL are also welcome!Thank you!
Consider a merge of three OrderSequence dataframes, each joined to a distinct CustomerName :
import pandas as pd
df = pd.DataFrame({'CustomerName': [1,1,1,1,1,1,1,2,2,2,3,3,3,3],
'OrderSequence': [1,2,2,2,3,3,3,1,2,3,1,1,2,3],
'Category': ['Food','Food','Clothes','Furniture','Clothes','Food','Toys',
'Clothes','Toys','Food','Furniture','Toys','Food','Food']})
finaldf = pd.DataFrame(df['CustomerName'].drop_duplicates())
for i in range(1,4):
seqdf = df[df['OrderSequence']==i][['CustomerName', 'Category']].\
rename(columns={'Category':'Category'+str(i)})
finaldf = pd.merge(finaldf, seqdf, on=['CustomerName'])
print(finaldf)
# CustomerName Category1 Category2 Category3
# 0 1 Food Food Clothes
# 1 1 Food Food Food
# 2 1 Food Food Toys
# 3 1 Food Clothes Clothes
# 4 1 Food Clothes Food
# 5 1 Food Clothes Toys
# 6 1 Food Furniture Clothes
# 7 1 Food Furniture Food
# 8 1 Food Furniture Toys
# 9 2 Clothes Toys Food
# 10 3 Furniture Food Food
# 11 3 Toys Food Food
Admittedly, the above setup was first thought out in SQL using self joins, then translated to pandas:
SELECT t1.CustomerName, t2.Category AS Category1,
t3.Category AS Category2, t4.Category AS Category3
FROM (SELECT DISTINCT CustomerName FROM DataFrame) AS t1
INNER JOIN DataFrame AS t2
ON t1.CustomerName = t2.CustomerName
INNER JOIN DataFrame AS t3
ON t1.CustomerName = t3.CustomerName
INNER JOIN DataFrame AS t4
ON t1.CustomerName = t4.CustomerName
WHERE (t2.OrderSequence=1) AND (t3.OrderSequence=2) AND (t4.OrderSequence=3);
okay. took some work but i did it. hope it helps.
import pandas as pd
import numpy as np
from itertools import combinations
df = pd.DataFrame([], columns=['CustomerName','Order Sequence','Category'])
df['CustomerName'] = [1,1,1,1,1,1,1,2,2,2,3,3,3,3]
df['Order Sequence'] = [1,2,2,2,3,3,3,1,2,3,1,1,2,3]
df['Category'] = ['Food','Food','Clothes','Furniture','Clothes','Food','Toys','Clothes','Toys','Food','Furniture','Toys','Food','Food']
df2 = pd.DataFrame([], columns=['CustomerName','Category1','Category2','Category3'])
for CN in sorted(set(df['CustomerName'])):
df_temp = pd.DataFrame([], columns=['CustomerName','Category1','Category2','Category3'])
list_OS_1 = []
list_OS_2 = []
list_OS_3 = []
MMC = reduce(lambda x, y: x*y,df.loc[df['CustomerName']==CN, 'Order Sequence'].value_counts().values)
for N in np.arange(MMC / len(df.loc[((df['CustomerName']==CN) & (df['Order Sequence']==1)), 'Category'])):
for CTG in df.loc[((df['CustomerName']==CN) & (df['Order Sequence']==1)), 'Category']:
list_OS_1.append(CTG)
for N in np.arange(MMC / len(df.loc[((df['CustomerName']==CN) & (df['Order Sequence']==2)), 'Category'])):
for CTG in df.loc[((df['CustomerName']==CN) & (df['Order Sequence']==2)), 'Category']:
list_OS_2.append(CTG)
for N in np.arange(MMC / len(df.loc[((df['CustomerName']==CN) & (df['Order Sequence']==3)), 'Category'])):
for CTG in df.loc[((df['CustomerName']==CN) & (df['Order Sequence']==3)), 'Category']:
list_OS_3.append(CTG)
df_temp['Category1'] = list_OS_1
df_temp['Category2'] = list_OS_2
df_temp['Category3'] = list_OS_3
df_temp['CustomerName'] = CN
df2 = pd.concat([df2,df_temp],0)
print (df2)
output:
CustomerName Category1 Category2 Category3
0 1.0 Food Food Clothes
1 1.0 Food Clothes Food
2 1.0 Food Furniture Toys
3 1.0 Food Food Clothes
4 1.0 Food Clothes Food
5 1.0 Food Furniture Toys
6 1.0 Food Food Clothes
7 1.0 Food Clothes Food
8 1.0 Food Furniture Toys
0 2.0 Clothes Toys Food
0 3.0 Furniture Food Food
1 3.0 Toys Food Food
ps: its not dinamic, so if you add or remove categories it ll get fcked over. but as long as it follows the initial standard you passed me, it shld work
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.