简体   繁体   English

遍历列表时使索引超出范围

[英]Getting Index Out of Range while iterating through list

I wrote on machine learning algorithm that works perfectly now I have to iterate all the items of list against one another to generate a similarity token between 0.01 to 1.00.我写了一个完美运行的机器学习算法,现在我必须迭代列表中的所有项目以生成 0.01 到 1.00 之间的相似性标记。 Here's code这是代码

    temp[]
    start_node = 0
    end_node = 0
    length = len(temp)
    for start_node in range(length):
        doc1 = nlp(temp[start_node])
        for end_node in range(++start_node, length):
            doc2 = nlp(temp[end_node])
            similar = doc1.similarity(doc2)
            exp_value = float(0.85)
            if similar == 1.0:
                print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
            elif 0.96 < similar < 0.99:
                print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
                temp.remove(temp[end_node])

Here, I am trying to check one item with all others in the list if any items are similar then delete that item from the list as there is no benefit to check the similarity of sentences back again with other elements, that will be a waste of computing power.在这里,我试图检查列表中的所有其他项目是否有任何项目相似,然后从列表中删除该项目,因为再次检查句子与其他元素的相似性没有任何好处,这将浪费计算能力。 But when I am trying to pop out elements I am getting Out of index error.但是当我试图弹出元素时,我得到了索引错误。

<ipython-input-12-c1959947bdd1> in <module>
      4 length = len(temp)
      5 for start_node in range(length):
----> 6     doc1 = nlp(temp[start_node])
      7     for end_node in range(++start_node, length):
      8         doc2 = nlp(temp[end_node])

I am just trying to keep original sentences, delete all the sentences which are similar in list so it doesn't check back with those items.我只是想保留原始句子,删除列表中所有相似的句子,这样它就不会检查这些项目。

Temp list have 351 items, here i am just referencing as a list.临时列表有 351 项,这里我只是作为列表引用。

here;sa test of it在这里;对其进行测试

print(temp[:1])

['malicious: caliche development partners "financial statement"has been shared with you']

I tried creating another duplicated list and delete similar items from that list我尝试创建另一个重复列表并从该列表中删除类似项目

final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
    doc1 = nlp(temp[start_node])
    for end_node in range(++start_node, length):
        doc2 = nlp(temp[end_node])
        similar = doc1.similarity(doc2)
        exp_value = float(0.85)
        if similar == 1.0:
            print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
        elif 0.96 < similar < 0.99:
            print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
            final_items.remove(temp[end_node])

But still got the same list index out of range while I am deleting elements from another list which I am not iterating even.但是,当我从另一个列表中删除我什至没有迭代的元素时,仍然使相同的列表索引超出范围。

I think your problem lays here.我想你的问题出在这里。

temp.remove(temp[end_node])

You will remove items in the temp list and therefor the list indexing will run out of range.您将删除temp列表中的项目,因此列表索引将超出范围。

Let's say, to start with temp contain 351 items, Ie index 0 to 350.假设从temp开始包含 351 个项目,即索引 0 到 350。

Now, the script will remove 1 (or more) item in the temp list.现在,脚本将删除temp列表中的 1 个(或更多)项目。
Suddenly the temp list will have 350 items, Ie index 0 to 349.突然temp列表将有 350 个项目,即索引 0 到 349。

However, the script still iterate using the temp original length of 351.但是,脚本仍然使用临时原始长度 351 进行迭代。
So when the script comes to last iteration index 350 (or earlier if several items are removed) the interation will try get a list index that do not exist any more.因此,当脚本到达最后一次迭代索引 350(或更早,如果删除了多个项目)时,交互将尝试获取不再存在的列表索引。

doc1 = nlp(temp[350])

Since at this time the temp list index are 0 to 349.由于此时temp列表索引为 0 到 349。

Maybe better having an additional copy of the list for modification rather than modify the list you iterate over.最好有一个额外的列表副本进行修改,而不是修改您迭代的列表。
If you create additional list, remember to use copy method.如果您创建附加列表,请记住使用复制方法。

final_items = temp.copy()

Since regular assignment will keep reference to the temp list.由于常规分配将保留对temp列表的引用。
Python doc - copy() Python 文档 - 复制()

The problem is with your code is you are trying to delete/remove items inside the clone of the original array itself while iterating through the original array.问题在于您的代码是您在迭代原始数组时尝试删除/删除原始数组本身的克隆内的项目。 When you directly assign array to another varible it just create a link/reference to the original array.当您直接将数组分配给另一个变量时,它只需创建对原始数组的链接/引用。

Lets get your current code.让我们获取您当前的代码。

final_items = temp
start_node = 0
end_node = 0
length = len(temp)
for start_node in range(length):
    doc1 = nlp(temp[start_node])
    for end_node in range(++start_node, length):
        doc2 = nlp(temp[end_node])
        similar = doc1.similarity(doc2)
        exp_value = float(0.85)
        if similar == 1.0:
            print("Exact match", similar, temp[end_node], "---------||---------",  temp[start_node])
        elif 0.96 < similar < 0.99:
            print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
            final_items.remove(temp[end_node])

And lets get temp is the following array.temp是以下数组。

temp = [node1,node2,node3,........,nodeN] 

And

final_items = temp

where the array items belongs to the class Node其中数组项属于类Node

In here在这里

elif 0.96 < similar < 0.99:
    print("possible match", similar, temp[end_node], "---------||---------", temp[start_node])
    final_items.remove(temp[end_node])

Since final_items is same as temp , when you remove an element from final_items , that element is also removed from the temp also.由于final_itemstemp相同,因此当您从final_items中删除一个元素时,该元素也会从temp中删除。 Just look at this simple example.看看这个简单的例子。

>>> a=[1,2,3]
>>> b=a
>>> b
[1, 2, 3]
>>> b.remove(1)
>>> a
[2, 3]
>>> b
[2, 3]

So in your case , Imagine there were 100 nodes in temp array.所以在你的情况下,想象一下temp数组中有 100 个节点。 then in your for loop it will check indexes until 99. But while running the temp array has been shorten.然后在你的for循环中它会检查索引直到99。但是在运行临时数组时已经缩短了。 So it won't have 99th index.所以它不会有第 99 个索引。 Which raises index error.这会引发索引错误。

The easiest wy to solve this is create a hard copy of the array/list.解决此问题的最简单方法是创建数组/列表的硬拷贝。 There are several ways to create a hardcopy of a list without just linking it to original array.有几种方法可以创建列表的硬拷贝,而不仅仅是将其链接到原始数组。

final_items = [n for n in temp]

or或者

from copy import deepcopy as dc
final_items = dc(temp)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM