python中的子字符串匹配/列表比较

Question

我有两个不同的列表，例如，我需要找到列表的索引号具有更多类似的模式

list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']

list_2两个列表进行比较，如果list_2元素与list_1匹配，则它应返回存在该元素1的list_1的索引。注意：对于list_2第二个元素abdsc 230它必须返回4，因为它与list_1第4个元素匹配度最高

这是我要解决的代码

from bs4 import BeautifulSoup
import urllib
import pandas as pd
from difflib import SequenceMatcher as SM

def maxmatching_algo2(data, counter):
    data_word=[]
    data_word=str(data).split(" ")
    k=[]
    for i in processsorList_global:
        k+=str(i).split(",")
    temp=0
    rank_list=[]
    while temp<len(k):
        t=[]
        t+=str(k[temp]).split(" ")
        union_set=set(t)&set(data_word)
        rank_list+= [len(union_set)]
        temp+=1
    index= rank_list.index(max(rank_list))
    if index==0:
        df1.ix[counter, cl]="na"
    else:
        df1.ix[counter, cl]=index


def processor_list_online():
    processsorList = []
    url = "http://www.notebookcheck.net/Smartphone-Processors-Benchmark-List.149513.0.html"
    htmlfile = urllib.urlopen(url)
    soup = BeautifulSoup(htmlfile, 'html.parser')
    count = 1
    temp_count=0
    x=str()
    while True:

        if x=="Qualcomm Snapdragon S1 MSM7227":
            break
        else:
            for i in soup.find_all('tr'):
                count+=1
                temp=0
                for j in i.find_all('td', attrs={'class': 'specs'}):
                    if temp==1:
                        processsorList += [j.text]
                        x=j.text
                    temp+=1
                    temp_count+=1


    print temp_count
    return processsorList



###############################################################################################################################

###############################################################################################################################
df1 = pd.read_csv('proddata2.csv')
x = list(df1.columns.values)  #######################     name of column
cl = len(x)  #######################     column Length
rl = len(df1.index)  #######################     row length
df1["Processor Rank"] = ""
counter = 0
count = []

processsorList_global = processor_list_online()
for i in processsorList_global:
    print i

counter=0
while counter < cl:
    if x[counter] == "processor_type":
        count = counter
        break
    counter += 1

counter = 0
data = []
while counter < rl:
    data = df1.ix[counter, count]
    #print data
    if data=="na":
        df1.ix[counter, cl]="na"
    else:
       # maxmatching_algo(data, counter)
        maxmatching_algo2(data, counter)
    counter +=1

#print df1
#df1.to_csv('final_processor_rank.csv', sep=',')
print "process completed"

Answer 1

您将必须执行以下操作：

def compare_substrings_in_list(first_list, compared_list):
    for element in first_list:
        last_match = 0
        for idx, compared_list_element in enumerate(compared_list):
            if element in compared_list_element:
                last_match = idx + 1
        return last_match

在其中迭代“搜索”列表的每个元素，并尝试使用in运算符在第二个列表的每个元素上找到匹配项。

Answer 2

以下解决方案可能适合您。

>>> for i,val in enumerate(sorted(list_2, key= len, reverse = True)):
...     for j,val2 in enumerate(list_1):
...         if val in val2:
...             print j+1
...             exit()
... 
4

请注意，如果您有多个匹配项，则此解决方案是不够的。 但这完全取决于您的用例。

现在，这应该可以。

Answer 3

这样可以解决您的问题，

list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']

for strings in list_2:
print "-list1val--",strings
for other in list_1:
    print '--list2val---',other
    occurence = other.find(strings);
    if occurence==0:
        ind = list_1.index(other)
        print "the index of ",strings,"in list_1 is ",ind
        break

Answer 4

一种方法是创建一个函数，以从list_1返回sub_string的位置。 然后，使用map()在list_2每个元素上调用该函数

list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']

def get_position_from_list(item, l):
    for i, val in enumerate(l):
        if item in val:
           return i + 1
    else:
        return None

map(lambda x: get_position_from_list(x, list_1), list_2)
# returns: [1, 4]

python中的子字符串匹配/列表比较

问题描述

这是我要解决的代码

4 个解决方案

解决方案1
0 2016-10-02 06:03:09

解决方案2
0 2016-10-02 06:06:01

解决方案3
0 2016-10-02 06:36:10

解决方案4
0 已采纳 2016-10-02 07:04:34

python中的子字符串匹配/列表比较

问题描述

这是我要解决的代码

4 个解决方案

解决方案1 0 2016-10-02 06:03:09

解决方案2 0 2016-10-02 06:06:01

解决方案3 0 2016-10-02 06:36:10

解决方案4 0 已采纳 2016-10-02 07:04:34

解决方案1
0 2016-10-02 06:03:09

解决方案2
0 2016-10-02 06:06:01

解决方案3
0 2016-10-02 06:36:10

解决方案4
0 已采纳 2016-10-02 07:04:34