简体   繁体   中英

Pythonic way of replace values in one column from a two column table

I have a df with the origin and destination between two points and I want to convert the strings to a numerical index, and I need to have a representation to back convert it for model interpretation.

df1 = pd.DataFrame({"Origin": ["London", "Liverpool", "Paris", "..."], "Destination": ["Liverpool", "Paris", "Liverpool", "..."]})

I separately created a new index on the sorted values.

df2 = pd.DataFrame({"Location": ["Liverpool", "London", "Paris", "..."], "Idx": ["1", "2", "3", "..."]})

What I want to get is this:

df3 = pd.DataFrame({"Origin": ["1", "2", "3", "..."], "Destination": ["1", "3", "1", "..."]})

I am sure there is a simpler way of doing this but the only two methods I can think of are to do a left join onto the Origin column by the Origin to Location and the same for destination then remove extraneous columns, or loop of every item in df1 and df2 and replace matching values. I've done the looped version and it works but it's not very fast, which is to be expected.

I am sure there must be an easier way to replace these values but I am drawing a complete blank.

You can use .map() :

mapping = dict(zip(df2.Location, df2.Idx))

df1.Origin = df1.Origin.map(mapping)
df1.Destination = df1.Destination.map(mapping)
print(df1)

Prints:

  Origin Destination
0      2           1
1      1           3
2      3           1
3    ...         ...

Or "bulk" .replace() :

df1 = df1.replace(mapping)
print(df1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM