Python how to apply .replace in data frame for a large amount of values to be changed

Question

How can I execute the task below in a more efficient way?

I have two data frames; df1 has my original data and df2 has keys that needs to be updated in df1.

The problem is that there are about 2000 names that need to be changed.

cw=

    id      adgroup      cost  
    1001    GoogleMaps   101,1
    1002    Google       101,1
    1003    AppStore     101,1
    1004    GoogleDocs   101,1


reff_table=

    adgroup       new_adgroup       
    GoogleMaps    G_maps
    Google        GG
    AppStore      APG
    GoogleDocs    DOC

This is how I am doing it:

m1 = cw.loc[cw['adgroup']=='GoogleMaps'].replace({'GoogleMaps' :'G_maps'})
m2 = cw.loc[cw['adgroup']=='GoogleMaps'].replace({'Google' :'GG'})

final_cw = pd.concat([m1, m2)]

Doing this manually is a long process, I need find more efficient way to get it done.

Answer 1

You can just use a merge/join

Your original dataframe:

print(df1)

     id     adgroup  cost
0  1001  GoogleMaps   101
1  1002  GoogleMaps   101
2  1003      Google   101
3  1004    AppStore   101
4  1005    AppStore   101
5  1006  GoogleDocs   101

You dataframe containing your references:

print(df2)

      adgroup new_adgroup
0  GoogleMaps      G_Maps
1      Google          GG
2    AppStore         APG
3  GoogleDocs         DOC

Merge them on adgroup will align the reference values to the correct rows in your original data (then you can drop/rename/reorder columns as you wish):

df1.merge(df2, on='adgroup').drop(columns=['adgroup']).rename(columns={'new_adgroup':'adgroup'})

     id  cost adgroup
0  1001   101  G_Maps
1  1002   101  G_Maps
2  1003   101      GG
3  1004   101     APG
4  1005   101     APG
5  1006   101     DOC

Join methods

Lets say your original and reference dataframes are not a perfect match - how do you handle the extra/missing rows?

There are a number of join methods available to you: left , right , outer , inner

The Pandas documentation has a brief explanation of these, but lets say your reference dataframe was missing the adgroup code for AppStore (the same idea applies if your original dataframe is also missing something) and looks like this:

      adgroup new_adgroup
0  GoogleMaps      G_Maps
1      Google          GG
2  GoogleDocs         DOC

What happens to the AppStore rows in your original data? Well, you can control that...

If you want to prioritize your original data and make sure you keep those rows, you can use a left join, and you will simply have NA values for the missing codes:

df1.merge(df2, on='adgroup', how='left')

     id  cost adgroup
0  1001   101  G_Maps
1  1002   101  G_Maps
2  1003   101      GG
3  1004   101     NaN
4  1005   101     NaN
5  1006   101     DOC

If instead you want to prioritize your reference dataframe, such that only the codes found in the reference are in your output, you can use a right join. Notice that because AppStore is not in your reference dataframe, the AppStore rows from your original data are removed:

df1.merge(df2, on='adgroup', how='right')

     id  cost adgroup
0  1001   101  G_Maps
1  1002   101  G_Maps
2  1003   101      GG
3  1006   101     DOC

Answer 2

Given the following input:

df_data = pd.DataFrame([['GoogleMaps', 100, 1], ['Google', 200, 2], ['PlayStore', 300, 3]], columns=['ad_group', 'cost', 'id'])

df_new_index = pd.DataFrame([['GoogleMaps', 'GMaps'], ['Google', 'GG'], ['PlayStore', 'PS']], columns=['ad_group', 'new_ad_group'])

Try this one-line code:

df_data.ad_group = df_data.ad_group.map(df_new_index.set_index('ad_group')['new_ad_group'])

and gives:

  ad_group  cost  id
0    GMaps   100   1
1       GG   200   2
2       PS   300   3

If you set the index to the 'ad_group' column on the other dataframe then you can replace using pandas.Series.map function on your original dataframe 'ad_group' column.

Answer 3

Use Series.replace

cw['adgroup']=cw['adgroup'].replace(reff_table.set_index('adgroup')['new_adgroup'])

Python how to apply .replace in data frame for a large amount of values to be changed

Question

How can I execute the task below in a more efficient way?

The problem is that there are about 2000 names that need to be changed.

3 answers

solution1
2 ACCPTED 2019-10-25 22:54:48

Join methods

solution2
1 2019-10-25 23:00:24

solution3
0 2019-10-25 22:54:38

Python how to apply .replace in data frame for a large amount of values to be changed

Question

How can I execute the task below in a more efficient way?

The problem is that there are about 2000 names that need to be changed.

3 answers

solution1 2 ACCPTED 2019-10-25 22:54:48

Join methods

solution2 1 2019-10-25 23:00:24

solution3 0 2019-10-25 22:54:38

solution1
2 ACCPTED 2019-10-25 22:54:48

solution2
1 2019-10-25 23:00:24

solution3
0 2019-10-25 22:54:38