[英]How do you divide up a list into chunks which vary according to a normal distribution
我想列出数千个项目并将它们分组为 12 个块,其中每个块中找到的项目数对应于正态分布(钟形曲线)并且块之间没有重复项 - 列表必须自行耗尽。
['6355ab76f70c5c59749f2018',
'6355c797f70c5c5974a1cb15',
'6355d256f70c5c5974a36a6c',
'6355d270f70c5c5974a37356',
'6355d29bf70c5c5974a3810a',
'6355d300f70c5c5974a3a202',
'6355d31af70c5c5974a3ab03',
'6355d36cf70c5c5974a3c103',
'6355d371f70c5c5974a3c236',
'6355d389f70c5c5974a3c828',
'6355d94df70c5c5974a55450',
'6355d956f70c5c5974a556c1',
'6355d987f70c5c5974a5626d',
'6355d99df70c5c5974a566d9',
'6355d9b1f70c5c5974a56b5c',
'6355d9bbf70c5c5974a56d50',
'6355d9d3f70c5c5974a572e1',
'6355d9fdf70c5c5974a57c53',
'6355da0cf70c5c5974a57f8f',
'6355da11f70c5c5974a58065',
'6355da19f70c5c5974a58261',
'6355da68f70c5c5974a592ca',
'6355da6cf70c5c5974a593ab',
'6355da80f70c5c5974a597de',
'6355da8af70c5c5974a599fa',
'6355da93f70c5c5974a59c09',
'6355da98f70c5c5974a59d20',
'6355daa1f70c5c5974a59ec9',
'6355daa7f70c5c5974a59fec',
'6355dac5f70c5c5974a5a6dd',
'6355dadaf70c5c5974a5ab75',
'6355dafcf70c5c5974a5b2dc',
'6355db6df70c5c5974a5d24b',
'6355dba0f70c5c5974a5dfea',
'6355dc16f70c5c5974a5fe14',
'6355dc31f70c5c5974a6059d',
'6355dc37f70c5c5974a60782',
'6355dc3cf70c5c5974a608eb',
'6355dc41f70c5c5974a60a99',
'6355dc47f70c5c5974a60bb9',
'6355dc5cf70c5c5974a611ef',
'6355dc67f70c5c5974a61578',
'6355dcaaf70c5c5974a62831',
'6355dcb4f70c5c5974a62b2c',
'6355dcbff70c5c5974a62e73',
'6355dcc8f70c5c5974a63113',
'6355dcd7f70c5c5974a6355c',
'6355dcf3f70c5c5974a63c91',
'6355dcf7f70c5c5974a63de9',
'6355dd04f70c5c5974a64144',
'6355dd0ef70c5c5974a64438',
'6355dd53f70c5c5974a65902',
'6355dd61f70c5c5974a65cf6',
'6355dd6bf70c5c5974a66010',
'6355dd70f70c5c5974a66195',
'6355dd74f70c5c5974a662f9',
'6355dd98f70c5c5974a66d4e',
'6355dd9df70c5c5974a66e99',
'6355dda2f70c5c5974a66fbd',
'6355ddb0f70c5c5974a673e4',
'6355ddbaf70c5c5974a67638',
'6355ddc5f70c5c5974a6796b',
'6355ddcef70c5c5974a67bcf',
'6355de01f70c5c5974a6892c',
'6355de15f70c5c5974a68ecf',
'6355de1bf70c5c5974a69023',
'6355de3df70c5c5974a699ad',
'6355de58f70c5c5974a6a1ab',
'6355de62f70c5c5974a6a4df',
'6355de6bf70c5c5974a6a787',
'6355de9cf70c5c5974a6b5a8',
'6355dea0f70c5c5974a6b6ed',
'6355deccf70c5c5974a6c3dc',
'6355ded4f70c5c5974a6c602',
'6355dee8f70c5c5974a6cbd2',
'6355e8f1f70c5c5974a9db18',
'6355e924f70c5c5974a9ec85',
'6355e9dbf70c5c5974aa2b37',
'6355eaaef70c5c5974aa7348',
'6355ead5f70c5c5974aa81ac',
'6355ec02f70c5c5974aaefaa',
'6355ec64f70c5c5974ab135d',
'6355ec8df70c5c5974ab2157',
'6355ecb2f70c5c5974ab2ce7',
'6355eccaf70c5c5974ab346f',
'6355eccff70c5c5974ab3691',
'6355ecd3f70c5c5974ab376b',
'6355ece2f70c5c5974ab3ba0',
'6355eceef70c5c5974ab3efb',
'6355ecfef70c5c5974ab4384',
'6355ed03f70c5c5974ab44c3',
'6355ed24f70c5c5974ab4f4f',
'6355ed4cf70c5c5974ab5b39',
'6355ed78f70c5c5974ab6840',
'6355ed9ff70c5c5974ab7388',
'6355edb1f70c5c5974ab7888',
'6355edb3f70c5c5974ab790b']
我正在寻找这样的 output,一个对象列表,其数字键对应于 0-11 之间的数字,分块列表项作为键:
[
{ 0: ['6355ab76f70c5c59749f2018', '6355c797f70c5c5974a1cb15', '6355d256f70c5c5974a36a6c' ] },
{ 1: ['6355d270f70c5c5974a37356',
'6355d29bf70c5c5974a3810a',
'6355d300f70c5c5974a3a202',
'6355d31af70c5c5974a3ab03',
'6355d36cf70c5c5974a3c103',
'6355d371f70c5c5974a3c236',
'6355d389f70c5c5974a3c828'] },
...
]
它应该将输入列表分成均匀的(在两侧)块,以梯度数学方式递增,每个块更多指向 output 列表的中心。
我想要它,所以我传入的列表被分开,这样最多的项目被分组在中间(大致数字 4-8),并且当它们到达结果列表的“边缘”时,较少的项目被分组在一起(数字 0-3 和数字 9-12)。 但是输入列表的所有内容都必须用尽,因此项目以这种方式完全分布。
我试图用numpy
解决这个问题,但到目前为止我还没有得到我想要的 output。
我当前的代码(两个不同的功能):
def divide_list_normal(lst):
normal_dist = np.random.normal(size=len(lst)) # Generate a normal distribution of numbers
sorted_list = [x for _,x in sorted(zip(normal_dist,lst))] # Sort the list according to the normal distribution
chunk_size = int(len(lst)/len(normal_dist)) # Divide the list into chunks
chunks = [sorted_list[i:i+chunk_size] for i in range(0, len(sorted_list), chunk_size)]
return chunks
def divide_list_normal_define_chunk_size(lst, n):
normal_dist = np.random.normal(size=len(lst)) # Generate a normal distribution of numbers
sorted_list = [x for _,x in sorted(zip(normal_dist,lst))] # Sort the list according to the normal distribution
chunk_size = int(len(lst)/len(normal_dist)) # Divide the list into chunks
chunks = [sorted_list[i:i+chunk_size] for i in range(0, n, chunk_size)]
return chunks
第一个的 output 如下所示:
[['63a8d83336756fd65d455c77'],
['6355f7c6f70c5c5974adfbce'],
['635629c6f70c5c5974bbab53'],
['6355fa8bf70c5c5974aeb70f'],
['6355dcd7f70c5c5974a6355c'],
['63a96dae36756fd65d549333'],
['639245927eeb4e9fd025e397'],
['63562463f70c5c5974ba3b5c'],
['63a8e04736756fd65d4635cf'],
['635629a5f70c5c5974bba1c1'],
['6355f74ef70c5c5974addd2c'],...]
第二个 output 如下所示:
[['63aa1a9d36756fd65d7566cf'],
['6355ed78f70c5c5974ab6840'],
['63a94e1836756fd65d500d5d'],
['63a8e23e36756fd65d4667ec'],
['63a96c6536756fd65d5463db'],
['63d39021d34efb9c0983d64a'],
['635627a9f70c5c5974bb1573'],
['63b3a4c236756fd65d33750a'],
['63562320f70c5c5974b9e50b'],
['63aa1aec36756fd65d758676'],
['63a9551636756fd65d5111fb'],
['63562443f70c5c5974ba31ed']]
有没有办法将列表分成根据正态分布变化的块? 如果你知道怎么做,请分享。 谢谢你!
这可行,但根据您的要求可能会很慢
import numpy as np
from itertools import islice
testList = ['6355d29bf70c5c5974a3810a',
'6355d300f70c5c5974a3a202',
'6355d31af70c5c5974a3ab03',
'6355d36cf70c5c5974a3c103',
'6355d300f70c5c5974a3a202',
'6355d31af70c5c5974a3ab03',
'6355d36cf70c5c5974a3c103',
'6355d300f70c5c5974a3a202',
'6355d31af70c5c5974a3ab03',
'6355d36cf70c5c5974a3c103',
'6355d300f70c5c5974a3a202',
'6355d31af70c5c5974a3ab03',
'6355d36cf70c5c5974a3c103',
'6355d371f70c5c5974a3c236',
'6355d389f70c5c5974a3c828']
normal_dist = np.random.normal(size=len(testList),loc=10,scale=4)
sorted_list = [list(islice(testList, int(x))) for x in normal_dist]
您必须注意的一件事是,因为这些是列表的切片,正态分布不能越界,即:0<loc-scale<len(testList)
对于每个索引 i,找到 i+0.5 的 CDF,然后减去 i-.5 的 CDF。 那将是您应该放入该索引的列表的百分比。 对于第一个索引,您只有 i+.5 的 CDF,而不是减去 i-.5 的 CDF,对于最后一个索引,您只有 i-.5 的 CDF,并从 1 中减去它而不是 i+.5 的 CDF。 您会希望均值成为指数的中间值,并根据您想要的分布方式选择标准差(您可能希望它大约是指数数量的四分之一,但这取决于您)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.