使用python从列表中随机提取x个项目

Question

从两个列表开始，例如：

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

我想让用户输入他们想要提取的项目数量，作为整个列表长度的百分比，以及从每个列表中随机提取的相同索引。 例如说我想要 50% 的输出是

newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']

我使用以下代码实现了这一点：

from random import randrange

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

LengthOfList = len(lstOne)
print LengthOfList

PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []

HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake

for x in lstOne:
    if len(RangeOfListIndices)==int(HowManyIndicesToMake):
        break
    else:
        random_index = randrange(0,LengthOfList)
        RangeOfListIndices.append(random_index)

print RangeOfListIndices


newlstOne = []
newlstTwo = []

for x in RangeOfListIndices:
    newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
    newlstTwo.append(lstTwo[int(x)])

print newlstOne
print newlstTwo

但我想知道是否有更有效的方法来做到这一点，在我的实际用例中，这是从 145,000 个项目中进行二次采样。 此外，randrange 在这个尺度上是否足够没有偏差？

谢谢

Answer 1

问： I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.

A.最直接的方法直接符合您的规范：

 percentage = float(raw_input('What percentage? '))
 k = len(data) * percentage // 100
 indicies = random.sample(xrange(len(data)), k)
 new_list1 = [list1[i] for i in indicies]
 new_list2 = [list2[i] for i in indicies]

问： in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale? in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?

A.在 Python 2 和 Python 3 中， random.randrange()函数完全消除了偏差（它使用内部_randbelow()方法进行多个随机选择，直到找到无偏差的结果）。

在 Python 2 中， random.sample()函数略有偏差，但仅在 53 位的最后四舍五入。 在 Python 3 中， random.sample()函数使用内部_randbelow()方法并且是无偏差的。

Answer 2

只需将两个列表zip在一起，使用random.sample进行采样，然后再次zip以转回两个列表。

import random

_zips = random.sample(zip(lstOne,lstTwo), 5)

new_list_1, new_list_2 = zip(*_zips)

演示：

list_1 = range(1,11)
list_2 = list('abcdefghij')

_zips = random.sample(zip(list_1, list_2), 5)

new_list_1, new_list_2 = zip(*_zips)

new_list_1
Out[33]: (3, 1, 9, 8, 10)

new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')

Answer 3

你这样做的方式在我看来基本没问题。

如果你想避免多次采样同一个对象，你可以按照以下步骤进行：

a = len(lstOne)
choose_from = range(a)          #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]:       # selects the desired number of items from both original list
    newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
    newlstTwo.append(lstTwo[i]) # sequence

使用python从列表中随机提取x个项目

问题描述

3 个解决方案

解决方案1
10 已采纳 2014-05-04 17:45:10

解决方案2
1 2014-05-04 17:34:54

解决方案3
1 2014-05-04 17:44:22

使用python从列表中随机提取x个项目

问题描述

3 个解决方案

解决方案1 10 已采纳 2014-05-04 17:45:10

解决方案2 1 2014-05-04 17:34:54

解决方案3 1 2014-05-04 17:44:22

解决方案1
10 已采纳 2014-05-04 17:45:10

解决方案2
1 2014-05-04 17:34:54

解决方案3
1 2014-05-04 17:44:22