簡體   English   中英

Python中的random.sample和random.shuffle有什么區別

[英]What is the difference between random.sample and random.shuffle in Python

我有一個包含 1500 個元素的列表 a_tot,我想以隨機方式將這個列表分成兩個列表。 列表 a_1 將有 1300 個元素,列表 a_2 將有 200 個元素。 我的問題是關於用 1500 個元素隨機化原始列表的最佳方法。 當我將列表隨機化時,我可以用 1300 取一個切片,用 200 取另一個切片。一種方法是使用 random.shuffle,另一種方法是使用 random.sample。 兩種方法之間的隨機化質量有什么不同嗎? 列表 1 中的數據應該是隨機樣本以及列表 2 中的數據。 有什么建議嗎? 使用隨機播放:

random.shuffle(a_tot)    #get a randomized list
a_1 = a_tot[0:1300]     #pick the first 1300
a_2 = a_tot[1300:]      #pick the last 200

使用樣品

new_t = random.sample(a_tot,len(a_tot))    #get a randomized list
a_1 = new_t[0:1300]     #pick the first 1300
a_2 = new_t[1300:]      #pick the last 200

shuffle 的來源:

def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.
    """

    if random is None:
        random = self.random
    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

樣品來源:

def sample(self, population, k):
    """Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)
    """

    # XXX Although the documentation says `population` is "a sequence",
    # XXX attempts are made to cater to any iterable with a __len__
    # XXX method.  This has had mixed success.  Examples from both
    # XXX sides:  sets work fine, and should become officially supported;
    # XXX dicts are much harder, and have failed in various subtle
    # XXX ways across attempts.  Support for mapping types should probably
    # XXX be dropped (and users should pass mapping.keys() or .values()
    # XXX explicitly).

    # Sampling without replacement entails tracking either potential
    # selections (the pool) in a list or previous selections in a set.

    # When the number of selections is small compared to the
    # population, then tracking selections is efficient, requiring
    # only a small set and an occasional reselection.  For
    # a larger number of selections, the pool tracking method is
    # preferred since the list takes less space than the
    # set and it doesn't suffer from frequent reselections.

    n = len(population)
    if not 0 <= k <= n:
        raise ValueError, "sample larger than population"
    random = self.random
    _int = int
    result = [None] * k
    setsize = 21        # size of a small set minus size of an empty list
    if k > 5:
        setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
    if n <= setsize or hasattr(population, "keys"):
        # An n-length list is smaller than a k-length set, or this is a
        # mapping type so the other algorithm wouldn't work.
        pool = list(population)
        for i in xrange(k):         # invariant:  non-selected at [0,n-i)
            j = _int(random() * (n-i))
            result[i] = pool[j]
            pool[j] = pool[n-i-1]   # move non-selected item into vacancy
    else:
        try:
            selected = set()
            selected_add = selected.add
            for i in xrange(k):
                j = _int(random() * n)
                while j in selected:
                    j = _int(random() * n)
                selected_add(j)
                result[i] = population[j]
        except (TypeError, KeyError):   # handle (at least) sets
            if isinstance(population, list):
                raise
            return self.sample(tuple(population), k)
    return result

如您所見,在這兩種情況下,隨機化基本上是由行int(random() * n) 因此,底層算法本質上是相同的。

random.shuffle()給定的list 它的長度保持不變。

random.sample()從給定序列中挑選n項目而無需替換(也可以是元組或其他任何東西,只要它有__len__() )並以隨機順序返回它們。

shuffle()sample()之間有兩個主要區別:

1) Shuffle 將就地更改數據,因此其輸入必須是可變序列。 相比之下,sample 生成一個新列表,它的輸入可以有更多的變化(元組、字符串、xrange、字節數組、集合等)。

2) Sample 可以讓你做更少的工作(即部分洗牌)。

通過證明可以根據sample()實現shuffle()來展示兩者之間的概念關系很有趣:

def shuffle(p):
   p[:] = sample(p, len(p))

反之亦然,根據shuffle()實現sample ()

def sample(p, k):
   p = list(p)
   shuffle(p)
   return p[:k]

在 shuffle() 和 sample() 的實際實現中,這兩者都沒有那么高效,但它確實顯示了它們的概念關系。

我認為它們完全相同,只是一個更新了原始列表,一個使用(只讀)它。 質量沒有差別。

這兩種選擇的隨機化應該一樣好。 我會說使用shuffle ,因為讀者可以更清楚地了解它的作用。

from random import shuffle
from random import sample 
x = [[i] for i in range(10)]
shuffle(x)
sample(x,10)

shuffle 更新相同列表中的輸出,但樣本返回更新列表樣本提供 pic 設施中的參數編號,但 shuffle 提供相同長度輸入的列表

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM