繁体   English   中英

python中的子字符串匹配/列表比较

[英]substring matching / list comparision in python

我有两个不同的列表,例如,我需要找到列表的索引号具有更多类似的模式

list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']

list_2两个列表进行比较,如果list_2元素与list_1匹配,则它应返回存在该元素1的list_1的索引。注意:对于list_2第二个元素abdsc 230它必须返回4,因为它与list_1第4个元素匹配度最高

这是我要解决的代码

from bs4 import BeautifulSoup
import urllib
import pandas as pd
from difflib import SequenceMatcher as SM

def maxmatching_algo2(data, counter):
    data_word=[]
    data_word=str(data).split(" ")
    k=[]
    for i in processsorList_global:
        k+=str(i).split(",")
    temp=0
    rank_list=[]
    while temp<len(k):
        t=[]
        t+=str(k[temp]).split(" ")
        union_set=set(t)&set(data_word)
        rank_list+= [len(union_set)]
        temp+=1
    index= rank_list.index(max(rank_list))
    if index==0:
        df1.ix[counter, cl]="na"
    else:
        df1.ix[counter, cl]=index


def processor_list_online():
    processsorList = []
    url = "http://www.notebookcheck.net/Smartphone-Processors-Benchmark-List.149513.0.html"
    htmlfile = urllib.urlopen(url)
    soup = BeautifulSoup(htmlfile, 'html.parser')
    count = 1
    temp_count=0
    x=str()
    while True:

        if x=="Qualcomm Snapdragon S1 MSM7227":
            break
        else:
            for i in soup.find_all('tr'):
                count+=1
                temp=0
                for j in i.find_all('td', attrs={'class': 'specs'}):
                    if temp==1:
                        processsorList += [j.text]
                        x=j.text
                    temp+=1
                    temp_count+=1


    print temp_count
    return processsorList



###############################################################################################################################

###############################################################################################################################
df1 = pd.read_csv('proddata2.csv')
x = list(df1.columns.values)  #######################     name of column
cl = len(x)  #######################     column Length
rl = len(df1.index)  #######################     row length
df1["Processor Rank"] = ""
counter = 0
count = []

processsorList_global = processor_list_online()
for i in processsorList_global:
    print i

counter=0
while counter < cl:
    if x[counter] == "processor_type":
        count = counter
        break
    counter += 1

counter = 0
data = []
while counter < rl:
    data = df1.ix[counter, count]
    #print data
    if data=="na":
        df1.ix[counter, cl]="na"
    else:
       # maxmatching_algo(data, counter)
        maxmatching_algo2(data, counter)
    counter +=1

#print df1
#df1.to_csv('final_processor_rank.csv', sep=',')
print "process completed"

您将必须执行以下操作:

def compare_substrings_in_list(first_list, compared_list):
    for element in first_list:
        last_match = 0
        for idx, compared_list_element in enumerate(compared_list):
            if element in compared_list_element:
                last_match = idx + 1
        return last_match

在其中迭代“搜索”列表的每个元素,并尝试使用in运算符在第二个列表的每个元素上找到匹配项。

以下解决方案可能适合您。

>>> for i,val in enumerate(sorted(list_2, key= len, reverse = True)):
...     for j,val2 in enumerate(list_1):
...         if val in val2:
...             print j+1
...             exit()
... 
4

请注意,如果您有多个匹配项,则此解决方案是不够的。 但这完全取决于您的用例。

现在,这应该可以。

这样可以解决您的问题,

list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']

for strings in list_2:
print "-list1val--",strings
for other in list_1:
    print '--list2val---',other
    occurence = other.find(strings);
    if occurence==0:
        ind = list_1.index(other)
        print "the index of ",strings,"in list_1 is ",ind
        break

一种方法是创建一个函数,以从list_1返回sub_string的位置。 然后,使用map()list_2每个元素上调用该函数

list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']

def get_position_from_list(item, l):
    for i, val in enumerate(l):
        if item in val:
           return i + 1
    else:
        return None

map(lambda x: get_position_from_list(x, list_1), list_2)
# returns: [1, 4]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM