[英]Randomly extract x items from a list using python
从两个列表开始,例如:
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
我想让用户输入他们想要提取的项目数量,作为整个列表长度的百分比,以及从每个列表中随机提取的相同索引。 例如说我想要 50% 的输出是
newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']
我使用以下代码实现了这一点:
from random import randrange
lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
LengthOfList = len(lstOne)
print LengthOfList
PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []
HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake
for x in lstOne:
if len(RangeOfListIndices)==int(HowManyIndicesToMake):
break
else:
random_index = randrange(0,LengthOfList)
RangeOfListIndices.append(random_index)
print RangeOfListIndices
newlstOne = []
newlstTwo = []
for x in RangeOfListIndices:
newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
newlstTwo.append(lstTwo[int(x)])
print newlstOne
print newlstTwo
但我想知道是否有更有效的方法来做到这一点,在我的实际用例中,这是从 145,000 个项目中进行二次采样。 此外,randrange 在这个尺度上是否足够没有偏差?
谢谢
问: I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.
A.最直接的方法直接符合您的规范:
percentage = float(raw_input('What percentage? '))
k = len(data) * percentage // 100
indicies = random.sample(xrange(len(data)), k)
new_list1 = [list1[i] for i in indicies]
new_list2 = [list2[i] for i in indicies]
问: in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?
A.在 Python 2 和 Python 3 中, random.randrange()函数完全消除了偏差(它使用内部_randbelow()方法进行多个随机选择,直到找到无偏差的结果)。
在 Python 2 中, random.sample()函数略有偏差,但仅在 53 位的最后四舍五入。 在 Python 3 中, random.sample()函数使用内部_randbelow()方法并且是无偏差的。
只需将两个列表zip
在一起,使用random.sample
进行采样,然后再次zip
以转回两个列表。
import random
_zips = random.sample(zip(lstOne,lstTwo), 5)
new_list_1, new_list_2 = zip(*_zips)
演示:
list_1 = range(1,11)
list_2 = list('abcdefghij')
_zips = random.sample(zip(list_1, list_2), 5)
new_list_1, new_list_2 = zip(*_zips)
new_list_1
Out[33]: (3, 1, 9, 8, 10)
new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')
你这样做的方式在我看来基本没问题。
如果你想避免多次采样同一个对象,你可以按照以下步骤进行:
a = len(lstOne)
choose_from = range(a) #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]: # selects the desired number of items from both original list
newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
newlstTwo.append(lstTwo[i]) # sequence
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.