使用Python使用Vlook_up

Question

I have four columns in my excel data file: 我的excel数据文件中有四列：

CUI      ICD9/10    Out      Lookup
C0161894    39      4000001 C0000005
C0029730    398     4000002 C0000039
C0176693    398     4000003 C0000052
C0029730    3989    4000004 C0000074

I want to match 4th column from the 1st column and get 3rd column as the output using python. 我想匹配第1列的第4列，并使用python获得第3列作为输出。 As the data is large so indirectly i want to use vLookups, but here i dont have any specific value. 由于数据很大所以间接我想使用vLookups，但在这里我没有任何具体的价值。 I need to search in whole column 我需要搜索整列

Answer 1

If I understand you correctly, you want to compare the values in column 4 and column 1, and if they are equal, output a new column with the value from column 3. 如果我理解正确，您希望比较第4列和第1列中的值，如果它们相等，则输出一个新列，其中包含第3列中的值。

To do this, simply use np.where as follows: 要做到这一点，只需使用np.where ，如下所示：

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'CUI':['C0161894','C0029730','C0176693','C0029730','C0000074'],
                   'ICD9/10':[39,398,398,3989,3989],
                   'Out':[4000001,4000002,4000003,4000004,4000005],
                   'Lookup':['C0000005','C0000039','C0000052','C0000074','C0000074']})                       


df1['Match'] = np.where(df1.Lookup == df1.CUI,  df1.Out, 'No Match')

Output: 输出：

        CUI  ICD9/10    Lookup      Out     Match
0  C0161894       39  C0000005  4000001  No Match
1  C0029730      398  C0000039  4000002  No Match
2  C0176693      398  C0000052  4000003  No Match
3  C0029730     3989  C0000074  4000004  No Match
4  C0000074     3989  C0000074  4000005   4000005

Edit: 编辑：

In response to your comment, you can use the chunksize parameter in pandas.read_csv to read in only parts of your dataframe: 在回复您的评论时，您可以使用pandas.read_csv的chunksize参数来仅读取数据帧的部分内容：

For data in csv as follows: 对于csv中的data如下：

     CUI  ICD9/10    Lookup      Out
C0161894       39  C0000005  4000001
C0029730      398  C0000039  4000002
C0176693      398  C0000052  4000003
C0029730     3989  C0000074  4000004
C0000074     3989  C0000074  4000005

See https://stackoverflow.com/a/25962187/2254228 : You can do: 请参阅https://stackoverflow.com/a/25962187/2254228 ：您可以这样做：

chunksize = 1000
for chunk in pd.read_csv(data, chunksize=chunksize):
    # process(chunk) using the solution above
    # Output Chunk to new csv using `pd.to_csv('new_data')`

Edit2: Here I have compiled full sample code for you. Edit2：这里我已经为您编译了完整的示例代码。 Replace the file data and new_data with whatever your data file is called and replace the file paths with your file paths. 用调用的数据文件替换文件data和new_data ，并用文件路径替换文件路径。 This will avoid any memory errors from your datafile being too big. 这样可以避免数据文件中的任何内存错误太大。

For some a sample data.csv : 对于一些示例data.csv ：

     CUI  ICD9/10    Lookup      Out
C0161894       39  C0000005  4000001
C0029730      398  C0000039  4000002
C0176693      398  C0000052  4000003
C0029730     3989  C0000074  4000004
C0000074     3989  C0000074  4000005

Create a target csv file new_data as an empty csv file to store your new data frame: 创建目标csv文件new_data作为空csv文件来存储新数据框：

CUI  ICD9/10    Lookup      Out

Then import the old data, splitting it into chunk, where chunksize = the number of lines of the file to read in: 然后导入旧数据，将其拆分为块，其中chunksize =要读入的文件的行数：

# Read in line by line = set chunksize = 1
chunksize = 1

# Open New File
with open("Pathtonewfile/new_data.csv", "w") as f:

    # Iterate over the old data.csv file, reading in one line
    for chunk in pd.read_csv('Pathtooldfile/data.csv', index_col = False, chunksize=chunksize):

        # Carry out Lookup Calculation as above
        chunk['Match'] = np.where(chunk.Lookup == chunk.CUI,  chunk.Out, 'No Match')

        # Write the new dataframe chunk to "new_data.csv"
        chunk.to_csv(f, header=False, index=False, 
                     cols=['CUI','ICD9/10','Out','Lookup'],
                     mode = 'a')

This gives you an output in new_data.csv as follows: 这将为您提供new_data.csv的输出，如下所示：

        CUI  ICD9/10    Lookup      Out     Match
0  C0161894       39  C0000005  4000001  No Match
1  C0029730      398  C0000039  4000002  No Match
2  C0176693      398  C0000052  4000003  No Match
3  C0029730     3989  C0000074  4000004  No Match
4  C0000074     3989  C0000074  4000005   4000005

使用Python使用Vlook_up

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-04-15 10:13:50

使用Python使用Vlook_up

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-04-15 10:13:50

解决方案1
0 已采纳 2017-04-15 10:13:50