简体   繁体   中英

How to perform an Excel INDEX MATCH equivalent in Python

I have a question regarding how to perform what would be the equivalent of returning a value using the INDEX MATCH functions in Excel and applying it in Python.

As an Excel user performing data analytics and manipulation on large data-sets I have moved to Python for efficiency. What I am attempting to do is to populate the column cells within a pandas dataframe based on the value returned from the value stored within a dictionary.

In an attempt to do this I have used the following code:

# imported csv DataFrames

crew_data = pd.read_csv(r'C:\file_path\crew_data.csv') 

export_template = pd.read_csv(r'C:\file_path\export_template.csv')

#contract number dictionary

contract = {'Northern':'046-2019',
            'Southern':'048-2015D',}

#function that attempts to perform a INDEX MATCH equivalent 
def contract_num():

    for x, y in enumerate(crew_data.loc[:, 'Region']):

            if y in contract.keys():

                num = contract[y]

            else:

                print('ERROR')

    return(num)

#for loop which prepares then exports the load data

for i, r in enumerate(export_template):

    export_template.loc[:, 'Contract'] = contract_num()

export_template.to_csv(r'C:\file_path\export_files\UPLOADER.csv')

print(export_template)

To summarise what the code is intended to do is as follows:

  1. The for loop contained in the contract_num function begins by iterating over the Region column in the crew_data DataFrame
  2. if the value y from the DataFrame matches the key in the contract dictionary (Note: the Region column only contains 2 values, 'Southern' and 'Northern') it will return the corresponding value from the value in the contract dictionary
  3. The for loop which prepares then exports the load data calls on the contract_num() function to populate the Contract column in the export_template DataFrame

Please note that there are 116 additional columns which are populated in this loop which have been excluded from the code above to save space.

When the code is executed it produces the result as intended, however, the issue is that when the function is called in the second for loop it only returns a single value of 048-2015D instead of the value which corresponds to the correct Region.

As mentioned previously this would have typically been carried out in Excel using INDEX MATCH, however doing so is not as efficient as using a script such as that above.

Being a beginner, I suspect the example code may appear con-deluded and unnecessary and could be performed using a more concise method.

If anyone could provide a solution or guidance that would be greatly appreciated.

df = pd.DataFrame({'Region': ['Northern', 'Northern', 'Northern',
                              'Northern', 'Southern', 'Southern',
                              'Northern', 'Eastern']})

contract = {'Northern':'046-2019',
            'Southern':'048-2015D'}

# similar to INDEX MATCH 
df['Contract'] = df.Region.map(contract)

out:

     Region   Contract
0  Northern   046-2019
1  Northern   046-2019
2  Northern   046-2019
3  Northern   046-2019
4  Southern  048-2015D
5  Southern  048-2015D
6  Northern   046-2019
7   Eastern        NaN

you can add print if Contract has not matched:

if df.Contract.isna().any():
    print("ERROR")

or make an assertion:

assert not df.Contract.isna().any(), "found empty contract field"

and out in this case:

AssertionError: found empty contract field

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM