简体   繁体   中英

map US state name to two letter acronyms that was given in dictionary separately

Suppose now I have a dataframe with 2 columns: State and City.

Then I have a separate dict with the two-letter acronym for each state. Now I want to add a third column to map state name with its two-letter acronym. What should I do in Python/Pandas? For instance the sample question is as follows:

import pandas as pd
a = pd.Series({'State': 'Ohio', 'City':'Cleveland'})
b = pd.Series({'State':'Illinois', 'City':'Chicago'})
c = pd.Series({'State':'Illinois', 'City':'Naperville'})
d = pd.Series({'State': 'Ohio', 'City':'Columbus'})
e = pd.Series({'State': 'Texas', 'City': 'Houston'})
f = pd.Series({'State': 'California', 'City': 'Los Angeles'})
g = pd.Series({'State': 'California', 'City': 'San Diego'})
state_city = pd.DataFrame([a,b,c,d,e,f,g])
state_2 = {'OH': 'Ohio','IL': 'Illinois','CA': 'California','TX': 'Texas'}

Now I have to map the column State in the df state_city using the dictionary of state_2 . The mapped df state_city should contain three columns: state , city , and state_2letter .

The original dataset I had had multiple columns with nearly all US major cities.

Therefore it will be less efficient to do it manually. Is there any easy way to do it?

For one, it's probably easier to store the key-value pairs like state name: abbreviation in your dictionary, like this:

state_2 = {'Ohio': 'OH', 'Illinois': 'IL', 'California': 'CA', 'Texas': 'TX'}

You can achieve this easily:

state_2 = {state: abbrev for abbrev, state in state_2.items()}

Using pandas.DataFrame.map :

>>> state_city['abbrev'] = state_city['State'].map(state_2)
>>> state_city
          City       State abbrev
0    Cleveland        Ohio     OH
1      Chicago    Illinois     IL
2   Naperville    Illinois     IL
3     Columbus        Ohio     OH
4      Houston       Texas     TX
5  Los Angeles  California     CA
6    San Diego  California     CA

I do agree with @blacksite that the state_2 dictionary should map its values like that:

state_2 = {'Ohio': 'OH','Illinois': 'IL','California': 'CA','Texas': 'TX'}

Then using pandas.DataFrame.replace

state_city['state_2letter'] = state_city.State.replace(state_2)
state_city

|-|State      |City         |state_2letter|
|-|-----      |------       |----------|
|0| Ohio      | Cleveland   |   OH|
|1| Illinois  | Chicago     |   IL|
|2| Illinois  | Naperville  |   IL|
|3| Ohio      | Columbus    |   OH|
|4| Texas     | Houston     |   TX|
|5| California| Los Angeles |   CA|
|6| California| San Diego   |   CA|

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM