How to assign element from a list to a dataframe column after checking if a column value contains a string that is an element in the list? (Python)

Question

I have a pandas dataframe with a 'state' column that contains a string indicating a US state, however some of the records have the state name next to the abbreviation and others have just the abbreviation (eg some have 'Florida - FL' and others just 'FL'). I want to check whether the string in the 'state' column contains an element from the following list of state abbreviations:

state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]

and afterwards assign whatever said element is to a new column (for the purposes of this question the new column is called 'state_std'). I do not want to do this by looping through rows. How would I accomplish this?

This question is identical to the question here: Check if column contains value from a list and assign that value to new column

except that the above question is about how to do this in R, not Python.

Answer 1

Let's assume that the abbreviated state name is always at the end of the string. How about this?

state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"] 
                 
def state_parser(state):
    state_std = next((abbr for abbr in state_abbrevs if state.endswith(abbr)),None)
    if state_std:
        return state_std
    else:
        return state

data = ["Florida - FL", "NY", "California - CA"]

df = pd.DataFrame(data, columns=['state'])
df['state_std'] = df['state'].apply(state_parser)
print(df)

Output:

             state state_std
0     Florida - FL        FL
1               NY        NY
2  California - CA        CA

If the abbreviation doesn't always happen to be at the end, you can change the code:

state_std = next((abbr for abbr in state_abbrevs if abbr in state),None)

How to assign element from a list to a dataframe column after checking if a column value contains a string that is an element in the list? (Python)

Question

1 answers

solution1
0 ACCPTED 2020-10-15 18:58:59

How to assign element from a list to a dataframe column after checking if a column value contains a string that is an element in the list? (Python)

Question

1 answers

solution1 0 ACCPTED 2020-10-15 18:58:59

solution1
0 ACCPTED 2020-10-15 18:58:59