简体   繁体   中英

How do I look up values in a dataframe and return matching values using python/pandas?

I have 2 large dataframes, df1 and df2. I am missing a column (colB) in df2 and I would like to add that column based of the values in the shared column (colA). If I was using Excel I would do this via a standard vlookup formula but I'm struggling to get the desired results using the pandas merge function.

colA and colB both contain multiple entries of the same value so I'm using this line of code to create a new dataframe with only the unique pairings.

df_keyvalues = df1[["colA", "colB"]].drop_duplicates()

I'm then using merge to add colB into df2

df2 = df2.merge(df_keyvalues, how = "left", on = "colA")

After running the above, I do get colB in df2 but I also get more row in my dataframe that what I started with.

What am I doing wrong?

I would like to be able to lookup the value in df2[“colA”] in df1[“colA”] and return the value in df1[“colB”]. If the values in df2[“colA”] and df1[“colA”] is not an exact match, then leave value in df2[“colB”] empty and move on to the next one.

Thanks in advance.

If you are getting more rows after the merge, this means that colA is not a unique key of df_keyvalues . This in turn means that the mapping colA -> colB is not unique in df1 , ie for at least one value of colA there are multiple values of colB .

You need to create a unique mapping colA -> colB from df1 first. One way to do this would be:

# take the smallest value if A->B mapping is not unique
df_AtoB = df1.groupby("colA", as_index=False).agg(colB_=("colB", "min"))

What exactly is the "right" way to de-duplicate above depends on your use-case.

Afterwards you can fill-in colB in df2 as follows

df = df2.merge(df_AtoB, on="colA", how="left")
df.colB = df.colB.fillna(df.colB_)
df = df.drop(columns="colB_") 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM