[英]substring matching / list comparision in python
我有两个不同的列表,例如,我需要找到列表的索引号具有更多类似的模式
list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']
list_2
两个列表进行比较,如果list_2
元素与list_1
匹配,则它应返回存在该元素1的list_1的索引。注意:对于list_2
第二个元素abdsc 230
它必须返回4,因为它与list_1
第4个元素匹配度最高
from bs4 import BeautifulSoup
import urllib
import pandas as pd
from difflib import SequenceMatcher as SM
def maxmatching_algo2(data, counter):
data_word=[]
data_word=str(data).split(" ")
k=[]
for i in processsorList_global:
k+=str(i).split(",")
temp=0
rank_list=[]
while temp<len(k):
t=[]
t+=str(k[temp]).split(" ")
union_set=set(t)&set(data_word)
rank_list+= [len(union_set)]
temp+=1
index= rank_list.index(max(rank_list))
if index==0:
df1.ix[counter, cl]="na"
else:
df1.ix[counter, cl]=index
def processor_list_online():
processsorList = []
url = "http://www.notebookcheck.net/Smartphone-Processors-Benchmark-List.149513.0.html"
htmlfile = urllib.urlopen(url)
soup = BeautifulSoup(htmlfile, 'html.parser')
count = 1
temp_count=0
x=str()
while True:
if x=="Qualcomm Snapdragon S1 MSM7227":
break
else:
for i in soup.find_all('tr'):
count+=1
temp=0
for j in i.find_all('td', attrs={'class': 'specs'}):
if temp==1:
processsorList += [j.text]
x=j.text
temp+=1
temp_count+=1
print temp_count
return processsorList
###############################################################################################################################
###############################################################################################################################
df1 = pd.read_csv('proddata2.csv')
x = list(df1.columns.values) ####################### name of column
cl = len(x) ####################### column Length
rl = len(df1.index) ####################### row length
df1["Processor Rank"] = ""
counter = 0
count = []
processsorList_global = processor_list_online()
for i in processsorList_global:
print i
counter=0
while counter < cl:
if x[counter] == "processor_type":
count = counter
break
counter += 1
counter = 0
data = []
while counter < rl:
data = df1.ix[counter, count]
#print data
if data=="na":
df1.ix[counter, cl]="na"
else:
# maxmatching_algo(data, counter)
maxmatching_algo2(data, counter)
counter +=1
#print df1
#df1.to_csv('final_processor_rank.csv', sep=',')
print "process completed"
您将必须执行以下操作:
def compare_substrings_in_list(first_list, compared_list):
for element in first_list:
last_match = 0
for idx, compared_list_element in enumerate(compared_list):
if element in compared_list_element:
last_match = idx + 1
return last_match
在其中迭代“搜索”列表的每个元素,并尝试使用in运算符在第二个列表的每个元素上找到匹配项。
以下解决方案可能适合您。
>>> for i,val in enumerate(sorted(list_2, key= len, reverse = True)):
... for j,val2 in enumerate(list_1):
... if val in val2:
... print j+1
... exit()
...
4
请注意,如果您有多个匹配项,则此解决方案是不够的。 但这完全取决于您的用例。
现在,这应该可以。
这样可以解决您的问题,
list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']
for strings in list_2:
print "-list1val--",strings
for other in list_1:
print '--list2val---',other
occurence = other.find(strings);
if occurence==0:
ind = list_1.index(other)
print "the index of ",strings,"in list_1 is ",ind
break
一种方法是创建一个函数,以从list_1
返回sub_string的位置。 然后,使用map()
在list_2
每个元素上调用该函数
list_1=['abdsc 23h', 'nis 4hd qad', '234 apple 54f','abdsc 2300h']
list_2=['abdsc 23', 'abdsc 230']
def get_position_from_list(item, l):
for i, val in enumerate(l):
if item in val:
return i + 1
else:
return None
map(lambda x: get_position_from_list(x, list_1), list_2)
# returns: [1, 4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.