At first glance I thought this to be a simple issue yet I cannot find an answer that accurately fits...
I have a dictionary of state names and abbreviations like so;
{(' ak', ',ak', ', ak', 'juneau', ',alaska', ', alaska'): 'alaska',
(' al', ',al', ', al', 'montgomery', ',alabama', ', alabama'): 'alabama',
(' ar', ',ar', ', ar', 'little rock', ',arkansas', ', arkansas'): 'arkansas',
(' az', ',az', ', az', 'phoenix', ',arizona', ', arizona'): 'arizona',
I am attempting to map this dictionary over various cases of self reported Twitter location that I have in a pandas dataframe to look for partial matches. For example, if one case read 'anchorage,ak' it would change the value to Alaska. I could see this being quite simple if it were a list, yet there must be another way to do this without looping. Any help is greatly appreciated!
I think timgeb has the right idea above. I would add two things:
1) You can also remove all whitespace from the given case before processing-- thus, there will be no need to include ' ak'
, ',ak'
, and ', ak'
all as keys-- a simple 'ak'
key would suffice.
2) Instead of repeating the state values in the dictionary, I would create an extra hash from integers to states ie {0: 'alaska, 1: 'alabama' ...}
and store the corresponding integer key in your original dictionary.
Thus your resulting dictionary should look something like this:
A = {'ak': 0, 'juneau': 0, 'alaska': 0, 'al': 1, 'montgomery': 1, 'alabama': 1, ...}
And to access state names from integer values, you should have another dictionary like this for all 50 states:
B = {0: 'alaska', 1: 'alabama', ...}
so given a case...
case = 'anchorage,ak'
case_list = case.replace(' ', '').split(',') # remove all whitespace and split case by comma
for elem in case_list:
if elem in A:
# insert code to replace case with B[A[elem]]
break
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.