[英]How to select rows by applying condition on two columns in dataframe
我试图只选择那些在同一年出现并由同一个人购买的产品。
下面是输入数据框。
数据 = [['P1', 2019,'XYA'], ['P1',2020, 'XYA'], ['P3',2020,'UYH'], ['P2', 2019,'MSN'] ,['P1',2020,'UJK'],['P2', 2020,'MSN']]
df = pd.DataFrame(data, columns = ['Product', 'year', 'Name'])
df
Product year Name
0 P1 2019 XYA
1 P1 2020 XYA
2 P3 2020 UYH
3 P2 2019 MSN
4 P1 2020 UJK
5 P2 2020 MSN
所需输出:
Product year Name
0 P1 2019 XYA
1 P1 2020 XYA
2 P2 2019 MSN
3 P2 2020 MSN
我尝试使用 for 循环,但执行时间很长。 请帮助解决这个问题
unique_product_list = list(set(product))
unique_customers_list = list(set(Name))
subset_dataframe = pd.DataFrame()
for i in unique_product_list:
print(i,unique_product_list.index(i))
product_sub = data[(data['Product']== i)]
for cus in unique_customers_list:
customer_sub = product_sub[(product_sub['Name']== cus)]
#print('XXXX',cus_sub)
if not customer_sub.empty:
if (customer_sub['year'] == 2019).any() and (customer_sub['year'] == 2020).any():
subset_dataframe = subset_dataframe.append(customer_sub,ignore_index=True)
print( df[ df.groupby(['Product', 'Name'])['year'].transform('nunique') > 1 ] )
印刷:
Product year Name
0 P1 2019 XYA
1 P1 2020 XYA
3 P2 2019 MSN
5 P2 2020 MSN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.