在检查列值是否包含作为列表中元素的字符串后，如何将列表中的元素分配给数据框列？（Python）

Question

I have a pandas dataframe with a 'state' column that contains a string indicating a US state, however some of the records have the state name next to the abbreviation and others have just the abbreviation (eg some have 'Florida - FL' and others just 'FL').我有一个带有“州”列的熊猫数据框，其中包含一个指示美国州的字符串，但是有些记录在缩写旁边有州名，而其他记录只有缩写（例如，有些有“佛罗里达 - FL”和其他只是'FL'）。 I want to check whether the string in the 'state' column contains an element from the following list of state abbreviations:我想检查“状态”列中的字符串是否包含以下状态缩写列表中的元素：

state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]

and afterwards assign whatever said element is to a new column (for the purposes of this question the new column is called 'state_std').然后将所说的任何元素分配给一个新列（对于这个问题，新列称为“state_std”）。 I do not want to do this by looping through rows.我不想通过循环遍历行来做到这一点。 How would I accomplish this?我将如何做到这一点？

This question is identical to the question here: Check if column contains value from a list and assign that value to new column此问题与此处的问题相同：检查列是否包含列表中的值并将该值分配给新列

except that the above question is about how to do this in R, not Python.除了上述问题是关于如何在 R 中执行此操作，而不是 Python。

Answer 1

Let's assume that the abbreviated state name is always at the end of the string.让我们假设缩写的州名总是在字符串的末尾。 How about this?这个怎么样？

state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", 
          "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
          "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", 
          "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", 
          "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"] 
                 
def state_parser(state):
    state_std = next((abbr for abbr in state_abbrevs if state.endswith(abbr)),None)
    if state_std:
        return state_std
    else:
        return state

data = ["Florida - FL", "NY", "California - CA"]

df = pd.DataFrame(data, columns=['state'])
df['state_std'] = df['state'].apply(state_parser)
print(df)

Output:输出：

             state state_std
0     Florida - FL        FL
1               NY        NY
2  California - CA        CA

If the abbreviation doesn't always happen to be at the end, you can change the code:如果缩写并不总是出现在末尾，您可以更改代码：

state_std = next((abbr for abbr in state_abbrevs if abbr in state),None)

在检查列值是否包含作为列表中元素的字符串后，如何将列表中的元素分配给数据框列？（Python）

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-15 18:58:59

在检查列值是否包含作为列表中元素的字符串后，如何将列表中的元素分配给数据框列？ （Python）

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-15 18:58:59

在检查列值是否包含作为列表中元素的字符串后，如何将列表中的元素分配给数据框列？（Python）

解决方案1
0 已采纳 2020-10-15 18:58:59