[英]Pandas: Search if substring contains key in dictionary, and return value
我有一個字典(鍵,值)和一個使用熊貓的數據框。
mydict = {'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'}
和帶有列['Address']的數據框
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
如果在字典的鍵中找到了子字符串,如何搜索數據框以從字典中獲取值。
例如
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
將regex
與str.extract
一起使用str.extract
和map
字典的鍵:
df = pd.DataFrame({'Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG']})
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = {'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'}
pat = '|'.join(r"\b{}\b".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
說明 :
print (pat)
\bKULAR LUMPUR\b|\bSINGAPORE\b|\bHONG KONG\b|\bVIETNAM\b
\\b
被稱為\\b
之間匹配詞的詞邊界
|
用於正則表達式OR
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.