繁体   English   中英

如何通过对数据框中的两列应用条件来选择行

[英]How to select rows by applying condition on two columns in dataframe

我试图只选择那些在同一年出现并由同一个人购买的产品。

下面是输入数据框。

数据 = [['P1', 2019,'XYA'], ['P1',2020, 'XYA'], ['P3',2020,'UYH'], ['P2', 2019,'MSN'] ,['P1',2020,'UJK'],['P2', 2020,'MSN']]

df = pd.DataFrame(data, columns = ['Product', 'year', 'Name']) 
df 


  Product  year Name
0      P1  2019  XYA
1      P1  2020  XYA
2      P3  2020  UYH
3      P2  2019  MSN
4      P1  2020  UJK
5      P2  2020  MSN

所需输出:

 Product  year Name
0      P1  2019  XYA
1      P1  2020  XYA
2      P2  2019  MSN
3      P2  2020  MSN

我尝试使用 for 循环,但执行时间很长。 请帮助解决这个问题

unique_product_list = list(set(product))
unique_customers_list = list(set(Name))

subset_dataframe  = pd.DataFrame()
for i in unique_product_list:
    print(i,unique_product_list.index(i))
    product_sub = data[(data['Product']== i)]
    for cus in unique_customers_list:
        customer_sub = product_sub[(product_sub['Name']== cus)]
        #print('XXXX',cus_sub)
        if not customer_sub.empty:
            if (customer_sub['year'] == 2019).any() and (customer_sub['year'] == 2020).any():
                subset_dataframe = subset_dataframe.append(customer_sub,ignore_index=True)
            
print( df[ df.groupby(['Product', 'Name'])['year'].transform('nunique') > 1 ] )

印刷:

  Product  year Name
0      P1  2019  XYA
1      P1  2020  XYA
3      P2  2019  MSN
5      P2  2020  MSN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM