Basically, I would like to fill in column Discount_Sub_Dpt with 'Yes' or 'No' depending on if there is a Discount for that Sub_Dpt for that week EXCLUDING the product on which that row lands (for instance I don't want any of the A rows to consider whether there is a Discount for that week for A but rather only for the products in that sub department(in most cases there is more than one other product).
I have tried using groupby with Sub_Dpt and Week to no avail.
Does anyone know how to solve this issue?
The Yellow column is obviously the desired outcome from the code.
Here is some of the code I have used, I am trying to create the column first and then update the values (but it could all potentially be wrong) (also I intentionally named the data frame df1):
df1['Discount_Sub_Dpt'] = np.where((df1['Discount']=='Yes'),'Yes','No')
grps = []
grps.append(df1.Sub_Dpt.unique())
for x in grps:
x = str(x)
yes_weeks = df1.loc[(df1.Discount_SubDpt == 'Yes') & (df1.Sub_Dpt_Description == x),'Week'].unique()
df1.loc[df1['Week'].isin(yes_weeks) & df1['Sub_Dpt_Description'] == x, 'Discount_SubDpt'] = 'Yes'
Okay, this might not scale well, but should be easy to read.
df1 = pd.DataFrame(data= [[ 'A', 1, 'Toys', 'Yes', ],
[ 'A', 2, 'Toys', 'No', ],
[ 'A', 3, 'Toys', 'No', ],
[ 'A', 4, 'Toys', 'Yes', ],
[ 'B', 1, 'Toys', 'No', ],
[ 'B', 2, 'Toys', 'Yes', ],
[ 'B', 3, 'Toys', 'No', ],
[ 'B', 4, 'Toys', 'Yes', ],
[ 'C', 1, 'Candy', 'No', ],
[ 'C', 2, 'Candy', 'No', ],
[ 'C', 3, 'Candy', 'Yes', ],
[ 'C', 4, 'Candy', 'Yes', ],
[ 'D', 1, 'Candy', 'No', ],
[ 'D', 2, 'Candy', 'No', ],
[ 'D', 3, 'Candy', 'No', ],
[ 'D', 4, 'Candy', 'No', ],], columns=['Product', 'Week', 'Sub_Dpt', 'Discount'])
df2 = df1.set_index(['Product', 'Week', 'Sub_Dpt'])
products = df1.Product.unique()
df1['Discount_SubDpt'] = df1.apply(lambda x: 'Yes' if 'Yes' in df2.loc[(list(products[products != x['Product']]), x['Week'], x['Sub_Dpt']), 'Discount'].tolist() else 'No', axis=1)
The first step creates a Multindex Dataframe.
Next, we get the list of all products
Next, for each row, we take out the same week and Sub Department and remove the product.
In this list if there is a discount, we select 'Yes' else 'No'
Edit 1:
If you don't want to create another dataframe (save memory, but will be a bit slower)
df1['Discount_SubDpt'] = df1.apply(lambda x: 'Yes' if 'Yes' in df1.loc[(df1['Product'] != x['Product']) & (df1['Week'] == x['Week']) & (df1['Sub_Dpt'] == x['Sub_Dpt']), 'Discount'].tolist() else 'No', axis=1)
Ok, the following is a bit crazy, but it works pretty nicely, so listen up.
First, we are going to build a NetworkX
graph as follows.
import networkx as nx
import numpy as np
import pandas as pd
G = nx.Graph()
Prods = df.Product.unique()
G.add_nodes_from(Prods)
We now add edges between our nodes (which are all of the products) whenever they belong to the same sub_dpt. In this case, since A and B share a dept, and C and D, do, we add edges AB and CD. If we had ABC in the same department, we would add AB, AC, BC. Confusing, I know, but just trust me on this one.
G.add_edges_from([('A','B'),('C','D')])
Now comes the fun part. We need to convert your Discount column from Yes/No to 1/0.
df['Disc2']=np.nan
df.loc[df['Discount']=='Yes','Disc2']=1
df.loc[df['Discount']=='No','Disc2']=0
Now we pivot the data
tab = df.pivot(index = 'Week',columns='Product',values = 'Disc2')
And now, we do this
tab = pd.DataFrame(np.dot(tab,nx.adjacency_matrix(G,Prods).todense()), columns=Prods,index=df.Week.unique())
tab[0].astype(bool)
df = df.merge(tab.unstack().reset_index(),left_on=['Product','Week'],right_on=['level_0','level_1'])
df['Discount_Sub_Dpt']=df[0]
print(df[['Product','Week','Sub_Dpt','Discount','Discount_Sub_Dpt']])
You may ask, why go through this trouble? Well, two reasons. First, its far more stable. The other answers can't handle all possible cases of your problem. Second, it's much faster than the other solutions. I hope this helped!
You can perform a GroupBy
to map ('Week', 'Sub_Dpt')
to lists of 'Product'
only when Discount is "Yes".
Then use a list comprehension to check if any are on Discount apart from the product in question. Finally, map a Boolean series result to "Yes" / "No".
Data from @SahilPuri.
# GroupBy only when Discount == Yes
g = df1[df1['Discount'] == 'Yes'].groupby(['Week', 'Sub_Dpt'])['Product'].unique()
# calculate index by row
idx = df1.set_index(['Week', 'Sub_Dpt']).index
# construct list of Booleans according to criteria
L = [any(x for x in g.get(i, []) if x!=j) for i, j in zip(idx, df1['Product'])]
# map Boolean to strings
df1['Discount_SubDpt'] = pd.Series(L).map({True: 'Yes', False: 'No'})
print(df1)
Product Week Sub_Dpt Discount Discount_SubDpt
0 A 1 Toys Yes No
1 A 2 Toys No Yes
2 A 3 Toys No No
3 A 4 Toys Yes Yes
4 B 1 Toys No Yes
5 B 2 Toys Yes No
6 B 3 Toys No No
7 B 4 Toys Yes Yes
8 C 1 Candy No No
9 C 2 Candy No No
10 C 3 Candy Yes No
11 C 4 Candy Yes No
12 D 1 Candy No No
13 D 2 Candy No No
14 D 3 Candy No Yes
15 D 4 Candy No Yes
It's late, but here's a go. I used the sample df in the comments above.
df1['dis'] = df1['Discount'].apply(lambda x: 1 if x =="Yes" else 0)
df2 = df1.groupby(['Sub_Dpt','Week']).sum()
df2.reset_index(inplace = True)
df3 = pd.merge(df1,df2, left_on=['Sub_Dpt','Week'], right_on =['Sub_Dpt','Week'])
df3['Discount_Sb_Dpt'] = np.where(df3['dis_x'] < df3['dis_y'], 'Yes', 'No')
df3.sort_values(by=['Product'], inplace = True)
df3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.