简体   繁体   中英

Python Pandas partial match of list of string in dataframe and return all match partial string

Hi everyone i am trying match partial string within a columns in data-frame and return the match string(Capital letter matter).I don't have a strong knowledge of programming and i just start learning.

    import os
    import pandas as pd
    import numpy as np
    import re
    state_abbrv = 
    ["AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA",
     "ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
      OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"]
    
    
    
     d = {"Index": [1, 2, 3, 4, 5 , 6, 7], "Description": ["BROOKLYN NY", "M1ANY", 
          "NYNY","DO","nyNY", "CWARD NY", "HOWARD BEACH NY"]}
     df = pd.DataFrame(data=d)

    
    
    
    statesjoin='|'.join(state_abbrv)
    df=df.assign(State = df["Description"].apply(lambda x: 
    ','.join(re.findall('..',x))).str.findall(statesjoin))
    
    print(df)

Current Result - Wrong

 Index   Description     State
       1      BROOKLYN NY        []
       2            M1ANY        []
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [AR, NY]
       7  HOWARD BEACH NY      [WA]

Correct Result

   Index      Description     State
       1      BROOKLYN NY      [NY]
       2            M1ANY      [NY]
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [WA,AR,NY]
       7  HOWARD BEACH NY  [WA,AR,NY]

Use list comprehension with test values of list by in statement:

df=df.assign(State = df["Description"].apply(lambda x: [y for y in state_abbrv if y in x]))
print (df)
   Index      Description         State
0      1      BROOKLYN NY      [NY, OK]
1      2            M1ANY          [NY]
2      3             NYNY          [NY]
3      4               DO            []
4      5             nyNY          [NY]
5      6         CWARD NY  [AR, NY, WA]
6      7  HOWARD BEACH NY  [AR, NY, WA]

Because your solution not return overlaping strings, here AR :

statesjoin='|'.join(state_abbrv)
df=df.assign(State = df["Description"].str.findall(statesjoin))
print (df)
   Index      Description     State
0      1      BROOKLYN NY  [OK, NY]
1      2            M1ANY      [NY]
2      3             NYNY  [NY, NY]
3      4               DO        []
4      5             nyNY      [NY]
5      6         CWARD NY  [WA, NY]
6      7  HOWARD BEACH NY  [WA, NY]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM