简体   繁体   English

基于标签在同一数据框中的查找值,然后添加到新列(Vlookup)

[英]Lookup value in the same dataframe based on label and add to a new column (Vlookup)

I have a table which contains laboratory results, including 'blind duplicate samples'. 我有一张表格,其中包含实验室结果,包括“盲目重复样品”。 These are basically a sample taken twice, where the second sample was given a non-descript label. 这些基本上是两次采样的样本,第二个样本被赋予了非描述性标签。 The corresponding origina; 相应的起源 sample is indicated in a separate column 样品在单独的栏中显示

Labels = ['A1-1', 'A1-2', 'A1-3', 'A1-4','B1-2', 'B1-3', 'B1-4', 'B1-5', 'Blank1', 'Blank2', 'Blank3']
Values = [8356532   ,7616084,5272477, 5076012, 411851,  415258, 8285777, 9700884, 9192185, 4466890,830516]
Duplicate_of = ['','','','','','','','','A1-1', 'A1-4', 'B1-3']
d = {'Labels': Labels, 'Values': Values, 'Duplicate_of' : Duplicate_of}
df = pd.DataFrame(data=d)
df = df[['Labels','Values','Duplicate_of']]

I would like to add a column to the dataframe which holds the 'value' from the original sample for the duplicates. 我想在数据框中添加一列,其中包含原始样本中重复项的“值”。 So a new column ('Original_value'), where for 'Blank1' the value of 'A1-1' is entered, for 'Blank2' the value of 'A1-4' is entered, etc. For rows where the 'Duplicate_of' field is empty, this new column is also empty. 因此,新建了一个列(“ Original_value”),其中对于“ Blank1”输入了“ A1-1”的值,对于“ Blank2”输入了“ A1-4”的值,等等。对于其中“ Duplicate_of”的行字段为空,此新列也为空。

In excel, this is very easy with Vlookup, but I haven't seen an easy way in Pandas (maybe other than joining the entire table with itself?) 在excel中,使用Vlookup非常容易,但是在Pandas中我还没有看到一种简便的方法(也许除了将整个表自身连接起来以外?)

Not a memory efficient answer but this works 不是有效的记忆答案,但这可行

import numpy as np
dictionary = dict(zip(Labels, Values))
df["Original_value"] = df["Duplicate_of"].map(lambda x: np.nan if x not in dictionary else dictionary[x])

For rest of the values in Original_Value it gives NaN. 对于Original_Value中的其余值,它给出NaN。 You can decide what you want in place of that. 您可以决定要替代什么。

The type of the new column will not be integer that can also be changed if needed. 新列的类型将不是整数,也可以根据需要进行更改。

with @jezrael comment the same thing can be done as 用@jezrael注释可以完成与

import numpy as np
dictionary = dict(zip(Labels, Values))
df["Original_value"] = df["Duplicate_of"].map(dictionary)

Here is the easiest way to do this, in one line: 在一行中,这是最简单的方法:

df["Original_value"] = df["Duplicate_of"].apply(lambda x: "" if x == "" else df.loc[df["Labels"] == x, "Values"].values[0])

Explanation: 说明:

This simply applies a lambda function to each element of the column "Duplicate_of" 这只是将lambda函数应用于"Duplicate_of"列的每个元素

First we check if the item is an empty string and we return an empty string if so: 首先,我们检查该项目是否为空字符串,如果是,则返回一个空字符串:

"" if x == ""

is equivalent to: 等效于:

if x == "" return ""

If it is not an empty string the following command is executed: 如果不是空字符串,则执行以下命令:

df.loc[df["Labels"] == x, "Values"].values[0]

This simple return the value in the column "Values" when the condition df["Labels"] == x is true. 当条件df["Labels"] == x为true时,此简单返回"Values"列中的"Values" If you are wondering about the .values[0] part, it is there because .loc returns a series; 如果您想知道.values[0]部分,那是因为.loc返回一个序列; our series in this case is just a single value so we simply get it with .values[0] . 在这种情况下,我们的系列只是一个值,因此我们只需使用.values[0]即可获得它。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas:根据相同 dataframe 的日期时间索引查找添加新列 - pandas: add new column based on datetime index lookup of same dataframe Pandas 从同一数据框中查找条件,然后添加到右侧作为新列 - Pandas lookup from same dataframe for criteria then add to right as new column Label 基于另一列(同一行)的值的列 pandas dataframe - Label a column based on the value of another column (same row) in pandas dataframe 使用相同的默认值向 DataFrame 添加新列 - Add new column to DataFrame with same default value 在 Pandas dataframe 中找到最小值并在新列上添加 label - Find the minimum value in a Pandas dataframe and add a label on new column pandas dataframe 添加基于查找值的列 - pandas dataframe add column based on lookup values 如何根据另一个 dataframe 的匹配为 dataframe 的新列添加值? - how to add value to a new column to a dataframe based on the match of another dataframe? 根据不同列的值对 dataframe 执行查找 - perform lookup on dataframe based on value of a different column 是否可以拆分列值并同时为数据框添加新列? - Is it possible to split a column value and add a new column at the same time for dataframe? 从一个 dataframe 中查找值,并在新列中返回与另一个 dataframe 基于公共值的另一列中最接近的值 - Lookup value from one dataframe and return in a new column the closest value from another column in another dataframe based on a common value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM