[英]How to assign element from a list to a dataframe column after checking if a column value contains a string that is an element in the list? (Python)
I have a pandas dataframe with a 'state' column that contains a string indicating a US state, however some of the records have the state name next to the abbreviation and others have just the abbreviation (eg some have 'Florida - FL' and others just 'FL').我有一个带有“州”列的熊猫数据框,其中包含一个指示美国州的字符串,但是有些记录在缩写旁边有州名,而其他记录只有缩写(例如,有些有“佛罗里达 - FL”和其他只是'FL')。 I want to check whether the string in the 'state' column contains an element from the following list of state abbreviations:
我想检查“状态”列中的字符串是否包含以下状态缩写列表中的元素:
state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
and afterwards assign whatever said element is to a new column (for the purposes of this question the new column is called 'state_std').然后将所说的任何元素分配给一个新列(对于这个问题,新列称为“state_std”)。 I do not want to do this by looping through rows.
我不想通过循环遍历行来做到这一点。 How would I accomplish this?
我将如何做到这一点?
This question is identical to the question here: Check if column contains value from a list and assign that value to new column此问题与此处的问题相同: 检查列是否包含列表中的值并将该值分配给新列
except that the above question is about how to do this in R, not Python.除了上述问题是关于如何在 R 中执行此操作,而不是 Python。
Let's assume that the abbreviated state name is always at the end of the string.让我们假设缩写的州名总是在字符串的末尾。 How about this?
这个怎么样?
state_abbrevs = ["AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA",
"HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD",
"MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ",
"NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC",
"SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
def state_parser(state):
state_std = next((abbr for abbr in state_abbrevs if state.endswith(abbr)),None)
if state_std:
return state_std
else:
return state
data = ["Florida - FL", "NY", "California - CA"]
df = pd.DataFrame(data, columns=['state'])
df['state_std'] = df['state'].apply(state_parser)
print(df)
Output:输出:
state state_std
0 Florida - FL FL
1 NY NY
2 California - CA CA
If the abbreviation doesn't always happen to be at the end, you can change the code:如果缩写并不总是出现在末尾,您可以更改代码:
state_std = next((abbr for abbr in state_abbrevs if abbr in state),None)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.