I am working with a DataFrame
which looks like this
List Numb Name
1 1 one
1 2 two
2 3 three
4 4 four
3 5 five
and I am trying to compute the following output.
List Numb Name
one 1 one
one 2 two
two 3 three
four 4 four
three 5 five
In my current approach I'm trying to iterate through the columns, then replace values with the contents of a third column.
For example, if List[0][1]
is equal to Numb[1][1]
replace column List[0][1]
with 'one'
.
How could I make an iteration like this work, or alternatively solve the problem without explicitly iterating at all?
Use map
df['List'] = df['List'].map(df.set_index('Numb')['Name'])
List Numb Name
0 one 1 one
1 one 2 two
2 two 3 three
3 four 4 four
4 three 5 five
import pandas as pd
df = pd.DataFrame({
'List': [1,1,2,4,3],
'Numb': [1,2,3,4,5],
'Name':['one','two','three','four','five']
})
dfnew = pd.merge(df, df, how='inner', left_on=['List'], right_on=['Numb'])
dfnew = dfnew.rename({'List_x': 'List', 'Numb_x': 'Numb', 'Name_y': 'Name'}, axis='columns')
dfnew = dfnew[['List','Numb','Name']]
dfnew['List'] = dfnew['Name']
print (dfnew)
# List Numb Name
#0 one 1 one
#1 one 2 one
#2 two 3 two
#3 four 4 four
#4 three 5 three
How about creating a dict to help you?
import pandas as pd
df = pd.DataFrame({'List': [1, 1, 2, 4, 3], 'Numb': [1, 2, 3, 4, 5], 'Name': ['one', 'two', 'three', 'four', 'five']})
d = dict(zip(df['Numb'], df['Name']))
df = df.replace({'List': d})
You can do this in one line. Looks like you want to join your dataframe onto itself:
df.rename(columns={"List": "List_numb"}).join(df.set_index("Numb")["Name"].to_frame("List"), on="List_numb")[["List", "Numb", "Name"]]
Similar to Vaishali's answer answer, but building a Series
explicitly seems to be a bit faster.
df['List'] = df['List'].map(pd.Series(df['Name'].values, df['Numb']))
Timings (the Numb and Name columns have unique-value dummy data and I only included the three fastest solutions so far):
>>> df
List Numb Name
0 1 1 one_0
1 1 2 two_1
2 2 3 three_2
3 4 4 four_3
4 3 5 five_4
... ... ... ...
4995 1 4996 one_4995
4996 1 4997 two_4996
4997 2 4998 three_4997
4998 4 4999 four_4998
4999 3 5000 five_4999
[5000 rows x 3 columns]
# Timings (i5-6200U CPU @ 2.30GHz, but only relative times are interesting)
>>> %timeit df.set_index('Numb')['Name'].reindex(df['List']).values # jpp
1.14 ms ± 3.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['List'].map(df.set_index('Numb')['Name']) # Vaishali
1.04 ms ± 7.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['List'].map(pd.Series(df['Name'].values, df['Numb'])) # timgeb
437 µs ± 3.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.