Pythonic way of replace values in one column from a two column table

Question

I have a df with the origin and destination between two points and I want to convert the strings to a numerical index, and I need to have a representation to back convert it for model interpretation.

df1 = pd.DataFrame({"Origin": ["London", "Liverpool", "Paris", "..."], "Destination": ["Liverpool", "Paris", "Liverpool", "..."]})

I separately created a new index on the sorted values.

df2 = pd.DataFrame({"Location": ["Liverpool", "London", "Paris", "..."], "Idx": ["1", "2", "3", "..."]})

What I want to get is this:

df3 = pd.DataFrame({"Origin": ["1", "2", "3", "..."], "Destination": ["1", "3", "1", "..."]})

I am sure there is a simpler way of doing this but the only two methods I can think of are to do a left join onto the Origin column by the Origin to Location and the same for destination then remove extraneous columns, or loop of every item in df1 and df2 and replace matching values. I've done the looped version and it works but it's not very fast, which is to be expected.

I am sure there must be an easier way to replace these values but I am drawing a complete blank.

Answer 1

You can use .map() :

mapping = dict(zip(df2.Location, df2.Idx))

df1.Origin = df1.Origin.map(mapping)
df1.Destination = df1.Destination.map(mapping)
print(df1)

Prints:

  Origin Destination
0      2           1
1      1           3
2      3           1
3    ...         ...

Or "bulk" .replace() :

df1 = df1.replace(mapping)
print(df1)

Pythonic way of replace values in one column from a two column table

Question

1 answers

solution1
1 ACCPTED 2021-04-13 22:14:05

Pythonic way of replace values in one column from a two column table

Question

1 answers

solution1 1 ACCPTED 2021-04-13 22:14:05

solution1
1 ACCPTED 2021-04-13 22:14:05