使用一个 dataframe 中的值查找另一个中的值并返回相邻单元格值并更新第一个 dataframe 值

Question

I have a 2 datasets (dataframes), one called source and the other crossmap.我有 2 个数据集（数据帧），一个称为源，另一个称为交叉图。 I am trying to find rows with a specific column value starting with "999", if one is found I need to look up the complete value of that column (ex "99912345") on the crossmap dataset (dataframe) and return the value from a column on that row in the cross-map.我正在尝试查找具有以“999”开头的特定列值的行，如果找到一个，我需要在交叉图数据集（数据框）上查找该列（例如“99912345”）的完整值并返回值交叉图中该行的一列。

# Source Dataframe 

        0       1       2       3       4
    ------  --------    --  ---------   -----
0   303290  544981      2   408300622   85882
1   321833  99910722    1   408300902   85897
2   323241  99902978    3   408056001   95564

# Cross Map Dataframe

ID      NDC ID  DIN(NDC)    GTIN            NAME                    PRDID
------- ------  --------    --------------  ----------------------  -----
44563   321833  99910722    99910722000000  SALBUTAMOL SULFATE (A)  90367
69281   321833  99910722    99910722000000  SALBUTAMOL SULFATE (A)  90367
6002800 323241  99902978    75402850039706  EPINEPHRINE (A)         95564
8001116 323241  99902978    99902978000000  EPINEPHRINE (A)         95564

The 'straw dog' logic I am working with is this:我正在使用的“稻草狗”逻辑是这样的：

search source file and find '999' entries in column 1搜索源文件并在第 1 列中找到“999”条目

df_source[df_source['Column1'].str.contains('999')]

interate through the rows returned and search for the value in column 1 in the crossmap dataframe column (DIN(NDC)) and return the corresponding PRDID遍历返回的行并在交叉图 dataframe 列 (DIN(NDC)) 的第 1 列中搜索值并返回相应的 PRDID
update the source dataframe with the PRDID, and write the updated file使用 PRDID 更新源 dataframe，并写入更新文件

It is these last two logic pieces where I am struggling with how to do this.这是最后两个逻辑部分，我正在努力解决如何做到这一点。 Appreciate any direction/guidance anyone can provide.感谢任何人可以提供的任何方向/指导。

Is there maybe a better/easier means of doing this using python but not pandas/dataframes?使用 python 而不是 pandas/dataframes 是否有更好/更简单的方法？

Answer 1

So, as far as I understood you correctly: we are looking for the first digits of 999 in the 'Source Dataframe' in the first column of the value.因此，据我正确理解：我们正在值第一列的“源数据帧”中寻找 999 的第一位数字。 Next, we find these values in the 'Cross Map' column 'DIN(NDC)' and we get the values of the column 'PRDID' on these lines.接下来，我们在“Cross Map”列“DIN(NDC)”中找到这些值，并在这些行上获取“PRDID”列的值。 If everything is correct, then I can't understand your further actions?如果一切正确，那么我无法理解你的进一步行动？

import pandas as pd
import more_itertools as mit

Cross_Map = pd.DataFrame({'DIN(NDC)': [99910722, 99910722, 99902978, 99902978],
                          'PRDID': [90367, 90367, 95564, 95564]})

df = pd.DataFrame({0: [303290, 321833, 323241], 1: [544981, 99910722, 99902978], 2: [2, 1, 3],
                   3: [408300622, 408300902, 408056001], 4: [85882, 85897, 95564]})


m = [i for i in df[1] if str(i)[:3] == '999'] #find the values in column 1
index = list(mit.locate(list(Cross_Map['DIN(NDC)']), lambda x: x in m)) #get the indexes of the matched column values DIN(NDC)
print(Cross_Map['PRDID'][index])

使用一个 dataframe 中的值查找另一个中的值并返回相邻单元格值并更新第一个 dataframe 值

问题描述

1 个解决方案

解决方案1
0 2022-04-08 19:24:06

使用一个 dataframe 中的值查找另一个中的值并返回相邻单元格值并更新第一个 dataframe 值

问题描述

1 个解决方案

解决方案1 0 2022-04-08 19:24:06

解决方案1
0 2022-04-08 19:24:06