I have two data frames; df1 has my original data and df2 has keys that needs to be updated in df1.
cw=
id adgroup cost
1001 GoogleMaps 101,1
1002 Google 101,1
1003 AppStore 101,1
1004 GoogleDocs 101,1
reff_table=
adgroup new_adgroup
GoogleMaps G_maps
Google GG
AppStore APG
GoogleDocs DOC
This is how I am doing it:
m1 = cw.loc[cw['adgroup']=='GoogleMaps'].replace({'GoogleMaps' :'G_maps'})
m2 = cw.loc[cw['adgroup']=='GoogleMaps'].replace({'Google' :'GG'})
final_cw = pd.concat([m1, m2)]
Doing this manually is a long process, I need find more efficient way to get it done.
You can just use a merge/join
Your original dataframe:
print(df1)
id adgroup cost
0 1001 GoogleMaps 101
1 1002 GoogleMaps 101
2 1003 Google 101
3 1004 AppStore 101
4 1005 AppStore 101
5 1006 GoogleDocs 101
You dataframe containing your references:
print(df2)
adgroup new_adgroup
0 GoogleMaps G_Maps
1 Google GG
2 AppStore APG
3 GoogleDocs DOC
Merge them on adgroup
will align the reference values to the correct rows in your original data (then you can drop/rename/reorder columns as you wish):
df1.merge(df2, on='adgroup').drop(columns=['adgroup']).rename(columns={'new_adgroup':'adgroup'})
id cost adgroup
0 1001 101 G_Maps
1 1002 101 G_Maps
2 1003 101 GG
3 1004 101 APG
4 1005 101 APG
5 1006 101 DOC
Lets say your original and reference dataframes are not a perfect match - how do you handle the extra/missing rows?
There are a number of join methods available to you: left
, right
, outer
, inner
The Pandas documentation has a brief explanation of these, but lets say your reference dataframe was missing the adgroup
code for AppStore
(the same idea applies if your original dataframe is also missing something) and looks like this:
adgroup new_adgroup
0 GoogleMaps G_Maps
1 Google GG
2 GoogleDocs DOC
What happens to the AppStore
rows in your original data? Well, you can control that...
If you want to prioritize your original data and make sure you keep those rows, you can use a left
join, and you will simply have NA
values for the missing codes:
df1.merge(df2, on='adgroup', how='left')
id cost adgroup
0 1001 101 G_Maps
1 1002 101 G_Maps
2 1003 101 GG
3 1004 101 NaN
4 1005 101 NaN
5 1006 101 DOC
If instead you want to prioritize your reference dataframe, such that only the codes found in the reference are in your output, you can use a right
join. Notice that because AppStore
is not in your reference dataframe, the AppStore
rows from your original data are removed:
df1.merge(df2, on='adgroup', how='right')
id cost adgroup
0 1001 101 G_Maps
1 1002 101 G_Maps
2 1003 101 GG
3 1006 101 DOC
Given the following input:
df_data = pd.DataFrame([['GoogleMaps', 100, 1], ['Google', 200, 2], ['PlayStore', 300, 3]], columns=['ad_group', 'cost', 'id'])
df_new_index = pd.DataFrame([['GoogleMaps', 'GMaps'], ['Google', 'GG'], ['PlayStore', 'PS']], columns=['ad_group', 'new_ad_group'])
Try this one-line code:
df_data.ad_group = df_data.ad_group.map(df_new_index.set_index('ad_group')['new_ad_group'])
and gives:
ad_group cost id
0 GMaps 100 1
1 GG 200 2
2 PS 300 3
If you set the index to the 'ad_group'
column on the other dataframe then you can replace using pandas.Series.map function on your original dataframe 'ad_group'
column.
Use Series.replace
cw['adgroup']=cw['adgroup'].replace(reff_table.set_index('adgroup')['new_adgroup'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.