簡體   English   中英

Python Pandas dataframe中字符串列表的部分匹配並返回所有匹配的部分字符串

[英]Python Pandas partial match of list of string in dataframe and return all match partial string

大家好,我正在嘗試在數據框中的列中匹配部分字符串並返回匹配字符串(大寫字母)。我沒有很強的編程知識,我剛剛開始學習。

    import os
    import pandas as pd
    import numpy as np
    import re
    state_abbrv = 
    ["AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA",
     "ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
      OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"]
    
    
    
     d = {"Index": [1, 2, 3, 4, 5 , 6, 7], "Description": ["BROOKLYN NY", "M1ANY", 
          "NYNY","DO","nyNY", "CWARD NY", "HOWARD BEACH NY"]}
     df = pd.DataFrame(data=d)

    
    
    
    statesjoin='|'.join(state_abbrv)
    df=df.assign(State = df["Description"].apply(lambda x: 
    ','.join(re.findall('..',x))).str.findall(statesjoin))
    
    print(df)

當前結果 - 錯誤

 Index   Description     State
       1      BROOKLYN NY        []
       2            M1ANY        []
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [AR, NY]
       7  HOWARD BEACH NY      [WA]

正確結果

   Index      Description     State
       1      BROOKLYN NY      [NY]
       2            M1ANY      [NY]
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [WA,AR,NY]
       7  HOWARD BEACH NY  [WA,AR,NY]

將列表推導與 list by in語句的測試值一起使用:

df=df.assign(State = df["Description"].apply(lambda x: [y for y in state_abbrv if y in x]))
print (df)
   Index      Description         State
0      1      BROOKLYN NY      [NY, OK]
1      2            M1ANY          [NY]
2      3             NYNY          [NY]
3      4               DO            []
4      5             nyNY          [NY]
5      6         CWARD NY  [AR, NY, WA]
6      7  HOWARD BEACH NY  [AR, NY, WA]

因為您的解決方案不返回重疊字符串,所以這里AR

statesjoin='|'.join(state_abbrv)
df=df.assign(State = df["Description"].str.findall(statesjoin))
print (df)
   Index      Description     State
0      1      BROOKLYN NY  [OK, NY]
1      2            M1ANY      [NY]
2      3             NYNY  [NY, NY]
3      4               DO        []
4      5             nyNY      [NY]
5      6         CWARD NY  [WA, NY]
6      7  HOWARD BEACH NY  [WA, NY]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM