简体   繁体   中英

Lookup value in the same dataframe based on label and add to a new column (Vlookup)

I have a table which contains laboratory results, including 'blind duplicate samples'. These are basically a sample taken twice, where the second sample was given a non-descript label. The corresponding origina; sample is indicated in a separate column

Labels = ['A1-1', 'A1-2', 'A1-3', 'A1-4','B1-2', 'B1-3', 'B1-4', 'B1-5', 'Blank1', 'Blank2', 'Blank3']
Values = [8356532   ,7616084,5272477, 5076012, 411851,  415258, 8285777, 9700884, 9192185, 4466890,830516]
Duplicate_of = ['','','','','','','','','A1-1', 'A1-4', 'B1-3']
d = {'Labels': Labels, 'Values': Values, 'Duplicate_of' : Duplicate_of}
df = pd.DataFrame(data=d)
df = df[['Labels','Values','Duplicate_of']]

I would like to add a column to the dataframe which holds the 'value' from the original sample for the duplicates. So a new column ('Original_value'), where for 'Blank1' the value of 'A1-1' is entered, for 'Blank2' the value of 'A1-4' is entered, etc. For rows where the 'Duplicate_of' field is empty, this new column is also empty.

In excel, this is very easy with Vlookup, but I haven't seen an easy way in Pandas (maybe other than joining the entire table with itself?)

Not a memory efficient answer but this works

import numpy as np
dictionary = dict(zip(Labels, Values))
df["Original_value"] = df["Duplicate_of"].map(lambda x: np.nan if x not in dictionary else dictionary[x])

For rest of the values in Original_Value it gives NaN. You can decide what you want in place of that.

The type of the new column will not be integer that can also be changed if needed.

with @jezrael comment the same thing can be done as

import numpy as np
dictionary = dict(zip(Labels, Values))
df["Original_value"] = df["Duplicate_of"].map(dictionary)

Here is the easiest way to do this, in one line:

df["Original_value"] = df["Duplicate_of"].apply(lambda x: "" if x == "" else df.loc[df["Labels"] == x, "Values"].values[0])

Explanation:

This simply applies a lambda function to each element of the column "Duplicate_of"

First we check if the item is an empty string and we return an empty string if so:

"" if x == ""

is equivalent to:

if x == "" return ""

If it is not an empty string the following command is executed:

df.loc[df["Labels"] == x, "Values"].values[0]

This simple return the value in the column "Values" when the condition df["Labels"] == x is true. If you are wondering about the .values[0] part, it is there because .loc returns a series; our series in this case is just a single value so we simply get it with .values[0] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM