I have a question regarding how to perform what would be the equivalent of returning a value using the INDEX MATCH functions in Excel and applying it in Python.
As an Excel user performing data analytics and manipulation on large data-sets I have moved to Python for efficiency. What I am attempting to do is to populate the column cells within a pandas dataframe based on the value returned from the value stored within a dictionary.
In an attempt to do this I have used the following code:
# imported csv DataFrames
crew_data = pd.read_csv(r'C:\file_path\crew_data.csv')
export_template = pd.read_csv(r'C:\file_path\export_template.csv')
#contract number dictionary
contract = {'Northern':'046-2019',
'Southern':'048-2015D',}
#function that attempts to perform a INDEX MATCH equivalent
def contract_num():
for x, y in enumerate(crew_data.loc[:, 'Region']):
if y in contract.keys():
num = contract[y]
else:
print('ERROR')
return(num)
#for loop which prepares then exports the load data
for i, r in enumerate(export_template):
export_template.loc[:, 'Contract'] = contract_num()
export_template.to_csv(r'C:\file_path\export_files\UPLOADER.csv')
print(export_template)
To summarise what the code is intended to do is as follows:
Please note that there are 116 additional columns which are populated in this loop which have been excluded from the code above to save space.
When the code is executed it produces the result as intended, however, the issue is that when the function is called in the second for loop it only returns a single value of 048-2015D instead of the value which corresponds to the correct Region.
As mentioned previously this would have typically been carried out in Excel using INDEX MATCH, however doing so is not as efficient as using a script such as that above.
Being a beginner, I suspect the example code may appear con-deluded and unnecessary and could be performed using a more concise method.
If anyone could provide a solution or guidance that would be greatly appreciated.
df = pd.DataFrame({'Region': ['Northern', 'Northern', 'Northern',
'Northern', 'Southern', 'Southern',
'Northern', 'Eastern']})
contract = {'Northern':'046-2019',
'Southern':'048-2015D'}
# similar to INDEX MATCH
df['Contract'] = df.Region.map(contract)
out:
Region Contract
0 Northern 046-2019
1 Northern 046-2019
2 Northern 046-2019
3 Northern 046-2019
4 Southern 048-2015D
5 Southern 048-2015D
6 Northern 046-2019
7 Eastern NaN
you can add print if Contract has not matched:
if df.Contract.isna().any():
print("ERROR")
or make an assertion:
assert not df.Contract.isna().any(), "found empty contract field"
and out in this case:
AssertionError: found empty contract field
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.