简体   繁体   English

检查列表中的重复项时循环中断?

[英]Loop breaking off when checking for duplicates in a list?

I'm trying to generate a list using random results from another list, and I want there to be no duplicates (with one exception) when I do so. 我正在尝试使用另一个列表的随机结果生成一个列表,并且我希望这样做时不要有重复项(一个例外)。 The problem occurs when I check for duplicates - it automatically breaks the loop when it finds one and I don't know why. 当我检查重复项时会发生问题-当找到重复项并且我不知道为什么时,它将自动中断循环。

As far as I can tell, everything seems to be correct. 据我所知,一切似乎都是正确的。 I've run the code through pythontutor.com/visualise, I've tried different pieces of code to check for duplicates, I've changed the loops into for loops, while loops, range loops. 我已经通过pythontutor.com/visualise运行了代码,我尝试了不同的代码来检查重复项,将循环更改为for循环,而while循环是range循环。 I tested out my definition for turning_point, I even copy-pasted it into the loop itself instead of using it as a function, I tried changing the position of the 'if' statement, etc. I've been at this for an entire night and I still can't troubleshoot the problem. 我测试了turning_point的定义,甚至将其复制粘贴到了循环本身中,而不是将其用作函数,我尝试更改了“ if”语句的位置,等等。而且我仍然无法解决问题。

Edit: I don't actually want such a heavy weight on a particular instance (in this case, "conclusion"), I just did that to test out the duplicates check. 编辑:我实际上并不希望在特定实例上使用这么大的重量(在这种情况下为“结论”),我只是这样做以测试重复检查。 In reality the weights are closer to 3, 1, 1, 2, 1, etc. The other thing is that in my actual code, the action_tables are 43 values long instead of just 7. 实际上,权重更接近3、1、1、2、1等。另一件事是,在我的实际代码中,action_tables的长度为43个值,而不是7个值。


#first list to draw random items from
action_table = ["conclusion", "none", "confrontation", "protector", "crescendo", "destroy the thing", "meta"]

#code to draw random items from first list
#first variable weighted to guarantee duplicates for testing
def turning_point():
    turning_point = (random.choices(action_table, [500, 1, 1, 1, 1, 1, 1]))
    return turning_point

#generate second list called 'plot_points'
plot_points = []
for x in range(int(input("How many Turning Points to generate? "))):
    tp = turning_point()
#the code below is what I got from this site
#added tp != to allow duplicates of "none" result
    if any(plot_points.count(tp) > 1 for tp in plot_points) and tp != "none": 
        continue
#results get added to the plot_points list:
    plot_points.append(tp)
print(plot_points)

If I remove the line that checks for duplicates, this is what I get: 如果我删除检查重复的行,这就是我得到的:

[['conclusion'], ['conclusion'], ['meta'], ['conclusion'], ['conclusion']]

If I don't remove the line, this is what I get: 如果我不删除该行,这就是我得到的:

[['conclusion'], ['conclusion']]

What I'd like to get is something like: 我想要得到的是这样的:

[['conclusion'], ['none'], ['none'], ['meta'], ['protector']]

The error is here: 错误在这里:

tp != "none"

tp is always a list with one element, because random.choices() returns a list with a single element by default. tp始终是具有一个元素的列表,因为默认情况下random.choices()返回具有单个元素的列表。 From the documentation for random.choices() : random.choices()文档中

random. choices ( population, weights=None, *, cum_weights=None, k=1 ) random. choices ( population, weights=None, *, cum_weights=None, k=1 ) Return a k sized list of elements chosen from the population with replacement. random. choices ( population, weights=None, *, cum_weights=None, k=1 )返回从总体中选择并替换的k元素列表。

With k left at 1, tp is going to be a 1-element list each time, and can never be equal to "none" . k保留为1,则tp每次将成为1元素列表,并且永远不能等于"none" It will be equal to ["none"] , or ["conclusion"] , and so forth, instead. 它将等于["none"]["conclusion"]等。 That means that `tp != "none" is always true . 这意味着`tp!=“ none” 始终为true

Next, your any() test only kicks in if there is more than one nested list with the currently selected value, so at least 2. At that point, you start skipping anything that appeared twice, because the tp != "none" is always true: 接下来,仅当存在多个具有当前选定值的嵌套列表时,您的any()测试才会启动,因此至少为2。此时,您将开始跳过出现两次的任何内容,因为tp != "none"是始终为真:

>>> plot_points = [["conclusion", "conclusion"]]
>>> tp = ["conclusion"]
>>> any(plot_points.count(tp) > 1 for tp in plot_points)
True
>>> tp != "none"
True

Your weightings for the choices given make it very, very unlikely that anything other than "conclusion" is picked. 您对给定选择的权重使得非常不可能选择"conclusion"以外的任何东西。 For your 7 options, ["conclusion"] will be picked 500 out of 506 times you call your turning_point() function, so the above scenario will occur most of the time (976562500000 out of every 1036579476493 experiments will turn up ["conclusion"] 5 times in a row, or about 33 out of every 35 tests). 对于您的7个选项,调用turning_point()函数会在506次中选出["conclusion"] 500次,因此上述情况大多数时候都会发生(每1036579476493个实验中的976562500000个会出现["conclusion"]连续5次,或每35个测试中约33个)。 So you'll extremely rarely will see any of the other options be produced twice, let alone 3 times (only 3 out of every 64777108 tests will repeat any of the other options three times or more). 因此,您几乎很少会看到其他任何选项都会产生两次,更不用说3次了(每64777108个测试中只有3个会重复任何其他选项3次或更多次)。

If you must produce a list in which nothing repeats except for none , then there is no point in weighting choices. 如果您必须生成一个列表,其中除了none之外什么都不要重复 ,那么加权选择就没有意义了。 The moment "conclusion" has been picked, you can't pick it again anyway . 当下"conclusion"已经被摘下来,你不能再捡也无妨 If the goal is to make it highly likely that a "conclusion" element is part of the result, then just make that a separate swap at the end, and just shuffle a list of the remaining choices first. 如果目标是使"conclusion"元素成为结果的一部分的可能性很大,则只需在末尾进行单独的交换,然后首先对其余选项列表进行随机排序即可。 Shuffling lets you cut the result down to size and the first N elements will all be random, and unique : 混洗使您可以将结果缩小到一定大小,并且前N元素都是随机的, 并且是唯一的

>>> import random
>>> action_table = ["confrontation", "protector", "crescendo", "destroy the thing", "meta"]
>>> random.shuffle(action_table)  # changes the list order in-place
>>> action_table[:3]
['meta', 'crescendo', 'destroy the thing']

You could pad out that list with "none" elements to make it long enough to meet the length requirements, and then insert a conclusion in a random position based on the chances that one should be included: 您可以使用"none"元素填充该列表,以使其足够长以满足长度要求,然后根据应包括一个conclusion的机会将其conclusion随机插入:

def plot_points(number):
    action_table = ["none", "confrontation", "protector", "crescendo", "destroy the thing", "meta"]
    if number > 6:
        # add enough `"none"` elements
        action_table += ["none"] * (number - 6)
    random.shuffle(action_table)
    action_table = action_table[:number]
    if random.random() > 0.8:
        # add in a random "conclusion"
        action_table[random.randrange(len(number))] = "conclusion"
    return action_table

Note that this is a pseudo-weighted selection; 注意,这是一个伪加权选择。 conclusion is selected 80% of the time, and uniqueness is preserved with only "none" repeated to pad out the results. conclusion是在80%的时间内被选择的,唯一性得以保留,仅重复"none"即可填充结果。 You can't have uniqueness for the other elements otherwise. 否则,您将无法对其他元素具有唯一性。

However, if you must have 但是,如果您必须

  • unique values in the output list (and possibly repeat "none" ) 输出列表中的唯一值(并可能重复"none"
  • weighted selection of inputs 加权输入选择

Then you want a weighted random sample selection without replacement . 然后,您需要选择一个加权随机样本而不进行替换 You can implement this using standard Python libraries: 您可以使用标准Python库实现此目的:

import heapq
import math
import random

def weighted_random_sample(population, weights, k):
    """Chooses k unique random elements from a population sequence.

    The probability of items being selected is based on their weight.

    Implementation of the algorithm by Pavlos Efraimidis and Paul
    Spirakis, "Weighted random sampling with a reservoir" in 
    Information Processing Letters 2006. Each element i is selected
    by assigning ids with the formula R^(1/w_i), with w_i the weight
    for that item, and the top k ids are selected. 

    """ 
    if not 0 <= k < len(population):
        raise ValueError("Sample larger than population or negative")
    if len(weights) != len(population):
        raise ValueError("The number of weights does not match the population")

    key = lambda iw: math.pow(random.random(), 1 / iw[1])
    decorated = heapq.nlargest(k, zip(population, weights), key=key)
    return [item for item, _ in decorated]

Use this to select your items if you need 7 items or fewer, otherwise and extra "none" values and just shuffle (as all 7 items end up selected anyway): 如果需要7个或更少的项目,请使用此选项选择项目;否则,需要额外的"none"值而只是随机播放(因为这7个项目最终还是被选中):

def plot_points(number):
    action_table = ["conclusion", "none", "confrontation", "protector", "crescendo", "destroy the thing", "meta"]

    if number > len(action_table):
        # more items than are available
        # pad out with `"none"` elements and shuffle
        action_table += ["none"] * (number - len(action_table))
        random.shuffle(action_table)
        return action_table

    weights = [3, 1, 1, 1, 2, 2, 1]
    return weighted_random_sample(action_table, weights, number)

Demo: 演示:

>>> plot_points(5)
['none', 'conclusion', 'meta', 'crescendo', 'destroy the thing']
>>> plot_points(5)
['conclusion', 'destroy the thing', 'crescendo', 'meta', 'confrontation']
>>> plot_points(10)
['none', 'crescendo', 'protector', 'confrontation', 'meta', 'destroy the thing', 'none', 'conclusion', 'none', 'none']

Of course, if your real action_table is much larger and you disallow picking more plot points than you have actions, there is no need to pad things out at all and you'd just use weighted_random_sample() directly. 当然,如果实际的action_table更大,并且不允许选择比动作更多的绘图点,则根本不需要进行任何填充,而直接使用weighted_random_sample()

Hello I would suggest you make the following changes to your code: 您好,我建议您对代码进行以下更改:

I tried my best to make as few changes as possible. 我尽力减少了更改。

def turning_point():
  turning_point = (random.choices(action_table))
  return turning_point

Your weights were too disproportionate, you would get 'conclusion' way too many times. 您的体重太不相称,您将多次获得“结论”。

for x in range(int(input("How many Turning Points to generate? "))):
    tp = turning_point()
    print(tp)
#the code below is what I got from this site
#added tp != to allow duplicates of "none" result
    plot_points.append(tp)
    if any(plot_points.count(tp) > 1 for tp in plot_points) and tp != ["none"]: 
        plot_points.pop()
#results get added to the plot_points list:
    # else: 
        # plot_points.append(tp)
    print(plot_points)
print(plot_points)

You will notice two main differences: 您会注意到两个主要区别:

  1. I changed the point at which you appended to your list and I replaced the continue statement with a pop() function. 我更改了添加到列表的位置,并用pop()函数替换了continue语句。 The problem is, you were checking the 'counts' too late(as an example, your loop would insert 'conclusion' inoto plot_points, then end and tp would change value, say it changes to 'protector', by the time you check the any..). 问题是,您检查“计数”太晚了(例如,您的循环将在插入“结论” inoto plot_points时结束,然后end和tp会更改值,例如在您检查“计数”时更改为“保护器”)。任何..)。 So appending and then checking the count, then popping if you have too many made more sense to me. 因此,追加然后检查计数,然后弹出(如果您有太多)对我来说更有意义。
  2. I changed your tp != "none" to tp != ["none"], because remember you're working with single member lists. 我将您的tp!=“ none”更改为tp!= [“ none”],因为请记住您正在使用单个成员列表。
  3. I added some print statements purely so you can visually see what is happening at each iteration 我纯粹是添加了一些打印语句,以便您可以直观地看到每次迭代发生的情况

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM