简体   繁体   中英

Create column with boolean values based on condition

I have a DataFrame which contains rows of orders from customers. I want to create a column which returns True or False values when the customer has ordered twice before. So the third time they make an order, the column 'Recurring Customer' gets a True value.

The DataFrame looks like this:

df = pd.DataFrame({
          'customer_id': ['5257', '8034', '21474', '21474', '21474', '6157']
})

The desired output should look like this:

df = pd.DataFrame({
          'customer_id': ['5257', '8034', '21474', '21474', '21474', '6157'],
          'recurring_customer: ['False', 'False', 'False', 'True', 'False]
})

I guess I have to use the np.where function but I don't know how to use it with unique and non-unique values. Could you help me with the last bit?

df['recurring_customer'] = np.where(df['customer_id'] 

Use groupby_cumcount :

df['recurring_customer'] = df.groupby('customer_id').cumcount() >= 2  # or == 2?
print(df)

# Output:
  customer_id  recurring_customer
0        5257               False
1        8034               False
2       21474               False
3       21474               False
4       21474                True
5        6157               False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM