繁体   English   中英

Python Pandas dataframe中字符串列表的部分匹配并返回所有匹配的部分字符串

[英]Python Pandas partial match of list of string in dataframe and return all match partial string

大家好,我正在尝试在数据框中的列中匹配部分字符串并返回匹配字符串(大写字母)。我没有很强的编程知识,我刚刚开始学习。

    import os
    import pandas as pd
    import numpy as np
    import re
    state_abbrv = 
    ["AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN","IA","KS","KY","LA",
     "ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ","NM","NY","NC","ND","OH","OK",
      OR","PA","RI","SC","SD","TN","TX","UT","VT","VA","WA","WV","WI","WY"]
    
    
    
     d = {"Index": [1, 2, 3, 4, 5 , 6, 7], "Description": ["BROOKLYN NY", "M1ANY", 
          "NYNY","DO","nyNY", "CWARD NY", "HOWARD BEACH NY"]}
     df = pd.DataFrame(data=d)

    
    
    
    statesjoin='|'.join(state_abbrv)
    df=df.assign(State = df["Description"].apply(lambda x: 
    ','.join(re.findall('..',x))).str.findall(statesjoin))
    
    print(df)

当前结果 - 错误

 Index   Description     State
       1      BROOKLYN NY        []
       2            M1ANY        []
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [AR, NY]
       7  HOWARD BEACH NY      [WA]

正确结果

   Index      Description     State
       1      BROOKLYN NY      [NY]
       2            M1ANY      [NY]
       3             NYNY  [NY, NY]
       4               DO        []
       5             nyNY      [NY]
       6         CWARD NY  [WA,AR,NY]
       7  HOWARD BEACH NY  [WA,AR,NY]

将列表推导与 list by in语句的测试值一起使用:

df=df.assign(State = df["Description"].apply(lambda x: [y for y in state_abbrv if y in x]))
print (df)
   Index      Description         State
0      1      BROOKLYN NY      [NY, OK]
1      2            M1ANY          [NY]
2      3             NYNY          [NY]
3      4               DO            []
4      5             nyNY          [NY]
5      6         CWARD NY  [AR, NY, WA]
6      7  HOWARD BEACH NY  [AR, NY, WA]

因为您的解决方案不返回重叠字符串,所以这里AR

statesjoin='|'.join(state_abbrv)
df=df.assign(State = df["Description"].str.findall(statesjoin))
print (df)
   Index      Description     State
0      1      BROOKLYN NY  [OK, NY]
1      2            M1ANY      [NY]
2      3             NYNY  [NY, NY]
3      4               DO        []
4      5             nyNY      [NY]
5      6         CWARD NY  [WA, NY]
6      7  HOWARD BEACH NY  [WA, NY]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM