Im new using python
please how should i do to get the result below. if the cod and date match of df_1 exists in df_2 then i should add the row as explained in my code below.
data1 = {'date': ['2021-06', '2021-06', '2021-07', '2021-07', '2021-07', '2021-07'], 'cod': ['12', '12', '14', '15', '15', '18'], 'Zone': ['LA', 'NY', 'LA', 'NY', 'PARIS', 'PARIS'], 'Revenue_Radio': [10, 20, 30, 50, 40, 10]}
df_1 = pd.DataFrame(data1)
data2 = {'date': ['2021-06', '2021-06', '2021-07', '2021-07', '2021-08'], 'cod': ['12', '14', '15', '15', '18'], 'Zone': ['PARIS', 'NY', 'LA', 'NY', 'NY'], 'Revenue_Str': [10, 20, 30, 50, 5]}
df_2 = pd.DataFrame(data2)
My code id
dfx = df_2[df_2['cod'].isin(df_1['cod']) &
(df_2['date'].isin(df_1['date'])) ]
df = (df_1.merge(dfx, on=['date','cod','Zone'], how='outer')
.fillna(0)
.sort_values(['date','cod'], ignore_index=True))
Expected output
data_result = {'date': ['2021-06', '2021-06', '2021-06', '2021-07', '2021-07', '2021-07', '2021-07', '2021-07','2021-07'], 'cod': ['12', '12', '12', '14', '14', '15', '15', '15', '18'], 'Zone': ['LA', 'NY', 'PARIS','LA', 'NY', 'NY', 'PARIS', 'LA', 'PARIS'], 'Revenue_Radio': [10, 20, 0, 30, 0, 50, 40, 0, 10], 'Revenue_Str': [0, 0, 10,0, 20, 50, 0, 30, 0]}
df_result = pd.DataFrame(data_result)
You can merge
and fillna
:
(df_1.merge(df_2, on=['date', 'cod', 'Zone'], how='outer')
.fillna(0, downcast='infer')
)
NB. it is not clear to me why you don't have the last row (18/NY) in your output
output:
date cod Zone Revenue_Radio Revenue_Str
0 2021-06 12 LA 10 0
1 2021-06 12 NY 20 0
2 2021-07 14 LA 30 0
3 2021-07 15 NY 50 50
4 2021-07 15 PARIS 40 0
5 2021-07 18 PARIS 10 0
6 2021-06 12 PARIS 0 10
7 2021-06 14 NY 0 20
8 2021-07 15 LA 0 30
9 2021-08 18 NY 0 5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.