[英]Python Pandas partial match of list of string in dataframe and return all match partial string
大家好,我正在嘗試在數據框中的列中匹配部分字符串並返回匹配字符串(大寫字母)。我沒有很強的編程知識,我剛剛開始學習。
import os
import pandas as pd
import numpy as np
import re
state_abbrv =
["AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA",
"ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"]
d = {"Index": [1, 2, 3, 4, 5 , 6, 7], "Description": ["BROOKLYN NY", "M1ANY",
"NYNY","DO","nyNY", "CWARD NY", "HOWARD BEACH NY"]}
df = pd.DataFrame(data=d)
statesjoin='|'.join(state_abbrv)
df=df.assign(State = df["Description"].apply(lambda x:
','.join(re.findall('..',x))).str.findall(statesjoin))
print(df)
當前結果 - 錯誤
Index Description State
1 BROOKLYN NY []
2 M1ANY []
3 NYNY [NY, NY]
4 DO []
5 nyNY [NY]
6 CWARD NY [AR, NY]
7 HOWARD BEACH NY [WA]
正確結果
Index Description State
1 BROOKLYN NY [NY]
2 M1ANY [NY]
3 NYNY [NY, NY]
4 DO []
5 nyNY [NY]
6 CWARD NY [WA,AR,NY]
7 HOWARD BEACH NY [WA,AR,NY]
將列表推導與 list by in
語句的測試值一起使用:
df=df.assign(State = df["Description"].apply(lambda x: [y for y in state_abbrv if y in x]))
print (df)
Index Description State
0 1 BROOKLYN NY [NY, OK]
1 2 M1ANY [NY]
2 3 NYNY [NY]
3 4 DO []
4 5 nyNY [NY]
5 6 CWARD NY [AR, NY, WA]
6 7 HOWARD BEACH NY [AR, NY, WA]
因為您的解決方案不返回重疊字符串,所以這里AR
:
statesjoin='|'.join(state_abbrv)
df=df.assign(State = df["Description"].str.findall(statesjoin))
print (df)
Index Description State
0 1 BROOKLYN NY [OK, NY]
1 2 M1ANY [NY]
2 3 NYNY [NY, NY]
3 4 DO []
4 5 nyNY [NY]
5 6 CWARD NY [WA, NY]
6 7 HOWARD BEACH NY [WA, NY]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.