[英]Pandas Copy column element and apply to another column based on related list
This is tricky problem and I am banging my head from a long time.这是一个棘手的问题,我很长一段时间都在敲我的头。 I have the following data frame.我有以下数据框。
dct = {'Store': ('A','A','A','A','A','A','B','B','B','C','C','C'),
'code_num':('INC101','INC102','INC103','INC104','INC105','INC106','INC201','INC202','INC203','INC301','INC302','INC303'),
'days':('4','18','9','15','3','6','10','5','3','1','8','5'),
'products': ('remote','antenna','remote, antenna','TV','display','TV','display, touchpad','speaker','Cell','display','speaker','antenna')
}
df = pd.DataFrame(dct)
pts = {'Primary': ('TV','TV','TV','Cell','Cell'),
'Related' :('remote','antenna','speaker','display','touchpad')
}
parts = pd.DataFrame(pts)
print(df)
Store code_num days products
0 A INC101 4 remote
1 A INC102 18 antenna
2 A INC103 9 remote, antenna
3 A INC104 15 TV
4 A INC105 3 display
5 A INC106 6 TV
6 B INC201 10 display, touchpad
7 B INC202 5 speaker
8 B INC203 3 Cell
9 C INC301 1 display
10 C INC302 8 speaker
11 C INC303 5 antenna
parts data frame is for reference, I have another piece of code that will provide a list for related parts and primary part for each store.零件数据框仅供参考,我还有一段代码,它将为每个商店提供相关零件和主要零件的列表。
#For Store A -> TV : ['remote','antenna','speaker'] ; #For Store A -> TV : ['remote','antenna','speaker'] ; Store B -> Cell :['display','touchpad'] and my expected dataframe is: Store B -> Cell :['display','touchpad'] 和我预期的数据帧是:
Store code_num days products refer
0 A INC101 4 remote INC106
1 A INC102 18 antenna -> omitted in 1st pass; because >10 days
2 A INC103 9 remote, antenna INC106
3 A INC104 15 TV -> omitted in 1st pass; because >10 days
4 A INC105 3 display
5 A INC106 6 TV INC106
6 B INC201 10 display, touchpad INC203
7 B INC202 5 speaker
8 B INC203 3 Cell INC203
9 C INC301 1 display -> blank because no primary present
10 C INC302 8 speaker -> blank because no primary present
11 C INC303 5 antenna -> blank because no primary present
I have code that is good for the execution for the whole df at once.我有适合一次执行整个 df 的代码。 But due to other business rules this will be a slice of data .但由于其他业务规则,这将是数据的一部分。 meaning 2 & 3 will be omitted so, .iloc value may be different for some records.这意味着 2 和 3 将被省略,因此某些记录的 .iloc 值可能不同。 So if you subset df on <=10 days and if is working for you then it will work for me.因此,如果您在 <=10 天内对 df 进行子集化,并且如果对您有用,那么它将对我有用。
If any more information is required please let me know.如果需要更多信息,请告诉我。 I know it is very complicated and is actually a brain teaser.我知道这很复杂,实际上是一个脑筋急转弯。
Replicated the scenario:复制场景:
Your inputs :您的输入:
dct = {'Store': ('A','A','A','A','A','A','B','B','B','C','C','C'),
'code_num':('INC101','INC102','INC103','INC104','INC105','INC106','INC201','INC202','INC203','INC301','INC302','INC303'),
'days':('4','18','9','15','3','6','10','5','3','1','8','5'),
'products': ('remote','antenna','remote,antenna','TV','display','TV','display,touchpad','speaker','Cell','display','speaker','antenna')
}
df = pd.DataFrame(dct)
pts = {'Primary': ('TV','TV','TV','Cell','Cell'),
'Related' :('remote','antenna','speaker','display','touchpad')
}
parts = pd.DataFrame(pts)
store = {'A':'TV','B':'Cell'}
Solution:解决方案:
Converting the parts df to Dictionary :将零件 df 转换为 Dictionary :
parts_df_dict = dict(zip(parts['Related'],parts['Primary']))
Splitting the comma seperated sub products and making them to seperate rows :拆分逗号分隔的子产品并使它们分开行:
new_df = pd.DataFrame(df.products.str.split(',').tolist(), index=df.code_num).stack()
new_df = new_df.reset_index([0, 'code_num'])
new_df.columns = ['code_num', 'Prod_seperated']
new_df = new_df.merge(df, on='code_num', how='left')
The logic to create the refer column :创建引用列的逻辑:
store_prod = {}
for k,v in store.items():
store_prod[k] = k+'_'+v
new_df['prod_store'] = new_df['Store'].map(store_prod)
new_df['p_store'] = new_df['Store'].map(store)
new_df['main_ind'] = ' '
new_df.loc[(new_df['prod_store']==new_df['Store']+'_'+new_df['Prod_seperated'])&(new_df['days'].astype('int')<10),'main_ind']=new_df['code_num']
refer_dic = new_df.groupby('Store')['main_ind'].max().to_dict()
new_df['prod_subproducts'] = new_df['Prod_seperated'].map(parts_df_dict)
new_df['refer'] = np.where((new_df['p_store']==new_df['prod_subproducts'])&(new_df['days'].astype('int')<=10),new_df['Store'].map(refer_dic),np.nan)
new_df['refer'].fillna(new_df['main_ind'],inplace=True)
new_df.drop(['Prod_seperated','prod_store','p_store','main_ind','prod_subproducts'],axis=1,inplace=True)
new_df.drop_duplicates(inplace=True)
new_df or required output : new_df 或所需的输出:
Please let me know if you have any doubts.如果您有任何疑问,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.