简体   繁体   English

是否有等效于 R 的 sample() 函数的 Python?

[英]Is there a Python equivalent to R's sample() function?

I want to know if Python has an equivalent to the sample() function in R.我想知道 Python 是否与 R 中的sample()函数等效。

The sample() function takes a sample of the specified size from the elements of x using either with or without replacement. sample()函数使用带替换或不带替换从 x 的元素中获取指定大小的样本。

The syntax is:语法是:

sample(x, size, replace = FALSE, prob = NULL)

(More information here ) (更多信息在这里

I think numpy.random.choice(a, size=None, replace=True, p=None) may well be what you are looking for.我认为numpy.random.choice(a, size=None, replace=True, p=None)很可能就是你要找的。

The p argument corresponds to the prob argument in the sample() function. p参数对应于sample()函数中的prob参数。

In pandas (Python's closest analogue to R) there are the DataFrame.sample and Series.sample methods, which were both introduced in version 0.16.1.熊猫(Python的最接近的类似物至R)有所述DataFrame.sampleSeries.sample方法,这正是在0.16.1版中引入。

For example:例如:

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> df
   a  b
0  1  6
1  2  7
2  3  8
3  4  9
4  5  0

Sampling 3 rows without replacement:无替换采样 3 行:

>>> df.sample(3)
   a  b
4  5  0
1  2  7
3  4  9

Sample 4 rows from column 'a' with replacement, using column 'b' as the corresponding weights for the choices:从带有替换的列 'a' 中抽取 4 行样本,使用列 'b' 作为选项的相应权重:

>>> df['a'].sample(4, replace=True, weights=df['b'])
3    4
0    1
0    1
2    3

These methods are almost identical to the R function, allowing you to sample a particular number of values - or fraction of values - from your DataFrame/Series, with or without replacement.这些方法几乎与 R 函数相同,允许您从 DataFrame/Series 中采样特定数量的值 - 或值的一部分,有或没有替换。 Note that the prob argument in R's sample() corresponds to weights in the pandas methods.请注意,R 的sample()中的prob参数对应于 pandas 方法中的weights

I believe that the random package works.我相信random包有效。 Specifically random.sample().特别是 random.sample()。

here这里

Other answers here are great, but I'd like to mention an alternative from Scikit-Learn that we can also use for this, see this link .这里的其他答案很棒,但我想提一下 Scikit-Learn 的替代方案,我们也可以使用它, 请参阅此链接

Something like this:像这样的东西:

resample(np.arange(1,100), n_samples=100, replace=True,random_state=2)

Gives you this:给你这个:

[41 16 73 23 44 83 76  8 35 50 96 76 86 48 64 32 91 21 38 40 68  5 43 52
 39 34 59 68 70 89 69 47 71 96 84 32 67 81 53 77 51  5 91 64 80 50 40 47
  9 51 16  9 18 23 74 58 91 63 84 97 44 33 27  9 77 11 41 35 61 10 71 87
 71 20 57 83  2 69 41 82 62 71 98 19 85 91 88 23 44 53 75 73 91 92 97 17
 56 22 44 94]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM