I have created a data frame df1 like below,
data = {'ID':[1,2,3,4,5,6,7,8,9,10],
'date_1':['2021-03-01','2021-03-02','2021-04-03','2021-03-04','2021-03-05','2021-03-06','2021-03-07','2021-03-08','2021-03-09','2021-03-10'],
'date_2': ['2021-03-06','2021-03-07','2021-03-08','2021-03-09','2021-03-10','2021-03-11','2021-03-12','2021-03-13','2021-03-14','2021-03-15']
}
df1 = pd.DataFrame(data, columns = ['ID','date_1','date_2'])
df1
I am trying to create a new dataframe df2 with just one column 'date_3' from df1. The column 'date_3' in df2 ideally should be returning just the rows(dates) from df1 which meet the condition of the below statement (True),
df1['date_1'] <= df1['date_2']
Below is my approach but I am just getting the conditional output (True/False) and the not the actual date values,
data = [df1['date_1'] <= df1['date_2']]
headers = ['date_3']
df2 = pd.concat(data, axis=1, keys=headers)
df2
Use:
In [489]: df2 = df[df['date_1'] <= df['date_2']]['date_1'].to_frame('date_3')
In [490]: df2
Out[490]:
date_3
0 2021-03-01
1 2021-03-02
3 2021-03-04
4 2021-03-05
5 2021-03-06
6 2021-03-07
7 2021-03-08
8 2021-03-09
9 2021-03-10
As advised by @ScottBoston, avoiding chain indexing:
df2 = df.loc[df['date_1'] <= df['date_2'], 'date_1'].to_frame('date_3')
This:
df2 = df.loc[df["date_1"]<= df["date_2"], ["ID", "date_1"]].copy()
df2.rename(columns= {"date_1": "date_3"})
will first subset based on your condition and only keep the ID and date_1 column, then you can rename the column
It also makes it explicit that you get a copy and will prevent you from getting any setWithCopyWarnings if you make any modifications
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.