I have a df with customer_id, year, order and a few other but unimportant columns. Every time when i get a new order, my code creates a new row, so there can be more than one row for each customer_id. I want to create a new column 'actually', which includes 'True' if a customer_id bought in 2020 or 2021. My Code is:
#Run through all customers and check if they bought in 2020 or 2021
investors = df["customer_id"].unique()
df["actually"] = np.nan
for i in investors:
selected_df = df.loc[df["customer_id"] == i]
for year in selected_df['year'].unique():
if "2021" in str(year) or "2020" in str(year):
df.loc[df["customer_id"] == i, "actually"] = "True"
break
#Want just latest orders / customers
df = df.loc[df["actually"] == "True"]
This works fine, but quite slow. I want to use Pandas groupby function, but didnt find a working way so far. Also i avoid loops. Anyone an idea?
you can create the column name 'Actually' something like this.
list1=df['Customer_id'][df.year==2020].unique()
list2=df['Customer_id'][df.year==2021].unique()
df['Actually']=df['Customer_id'].apply( lambda x : x in list1 or x in list2)
Based on my understanding of your scaenario, here is a simple code:
import pandas as pd
# Sample data to recreate the scenarion
data = {'customer_id': ['c1','c2','c1','c4','c3','c3'], 'year': [2019, 2018,2021,2012,2020,2021], 'order': ['A1','A2','A3','A4','A5','A6']}
df = pd.DataFrame.from_dict(data)
# Creating the new column (initially all false)
df['actually'] = False
# Filling only required rows with True
df.loc[(df['year']==2020) | (df['year']==2021), 'actually'] = True
print(df)
Which will produce:
customer_id year order actually
0 c1 2019 A1 False
1 c2 2018 A2 False
2 c1 2021 A3 True
3 c4 2012 A4 False
4 c3 2020 A5 True
5 c3 2021 A6 True
You can use apply method, to avoid loops:
df['actually']=df['customer_id'].apply(lambda x: df[df.customer_id==x]['year'].str.contains('2020').any() or df[df.customer_id==x]['year'].str.contains('2021').any())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.