简体   繁体   中英

How to assign element from a list to a dataframe column after checking if a column value contains a string that is an element in the list? (Python)

I have a pandas dataframe with a 'state' column that contains a string indicating a US state, however some of the records have the state name next to the abbreviation and others have just the abbreviation (eg some have 'Florida - FL' and others just 'FL'). I want to check whether the string in the 'state' column contains an element from the following list of state abbreviations:

state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]

and afterwards assign whatever said element is to a new column (for the purposes of this question the new column is called 'state_std'). I do not want to do this by looping through rows. How would I accomplish this?

This question is identical to the question here: Check if column contains value from a list and assign that value to new column

except that the above question is about how to do this in R, not Python.

Let's assume that the abbreviated state name is always at the end of the string. How about this?

state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"] 
                 
def state_parser(state):
    state_std = next((abbr for abbr in state_abbrevs if state.endswith(abbr)),None)
    if state_std:
        return state_std
    else:
        return state

data = ["Florida - FL", "NY", "California - CA"]

df = pd.DataFrame(data, columns=['state'])
df['state_std'] = df['state'].apply(state_parser)
print(df)

Output:

             state state_std
0     Florida - FL        FL
1               NY        NY
2  California - CA        CA

If the abbreviation doesn't always happen to be at the end, you can change the code:

state_std = next((abbr for abbr in state_abbrevs if abbr in state),None)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM