简体   繁体   中英

Use a value from one dataframe to lookup the value in another and return an adjacent cell value and update the first dataframe value

I have a 2 datasets (dataframes), one called source and the other crossmap. I am trying to find rows with a specific column value starting with "999", if one is found I need to look up the complete value of that column (ex "99912345") on the crossmap dataset (dataframe) and return the value from a column on that row in the cross-map.

# Source Dataframe 

        0       1       2       3       4
    ------  --------    --  ---------   -----
0   303290  544981      2   408300622   85882
1   321833  99910722    1   408300902   85897
2   323241  99902978    3   408056001   95564
# Cross Map Dataframe

ID      NDC ID  DIN(NDC)    GTIN            NAME                    PRDID
------- ------  --------    --------------  ----------------------  -----
44563   321833  99910722    99910722000000  SALBUTAMOL SULFATE (A)  90367
69281   321833  99910722    99910722000000  SALBUTAMOL SULFATE (A)  90367
6002800 323241  99902978    75402850039706  EPINEPHRINE (A)         95564
8001116 323241  99902978    99902978000000  EPINEPHRINE (A)         95564

The 'straw dog' logic I am working with is this:

  • search source file and find '999' entries in column 1
df_source[df_source['Column1'].str.contains('999')]
  • interate through the rows returned and search for the value in column 1 in the crossmap dataframe column (DIN(NDC)) and return the corresponding PRDID
  • update the source dataframe with the PRDID, and write the updated file

It is these last two logic pieces where I am struggling with how to do this. Appreciate any direction/guidance anyone can provide.

Is there maybe a better/easier means of doing this using python but not pandas/dataframes?

So, as far as I understood you correctly: we are looking for the first digits of 999 in the 'Source Dataframe' in the first column of the value. Next, we find these values in the 'Cross Map' column 'DIN(NDC)' and we get the values of the column 'PRDID' on these lines. If everything is correct, then I can't understand your further actions?

import pandas as pd
import more_itertools as mit

Cross_Map = pd.DataFrame({'DIN(NDC)': [99910722, 99910722, 99902978, 99902978],
                          'PRDID': [90367, 90367, 95564, 95564]})

df = pd.DataFrame({0: [303290, 321833, 323241], 1: [544981, 99910722, 99902978], 2: [2, 1, 3],
                   3: [408300622, 408300902, 408056001], 4: [85882, 85897, 95564]})


m = [i for i in df[1] if str(i)[:3] == '999'] #find the values in column 1
index = list(mit.locate(list(Cross_Map['DIN(NDC)']), lambda x: x in m)) #get the indexes of the matched column values DIN(NDC)
print(Cross_Map['PRDID'][index])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM