大熊猫抽样

Question

If I want to randomly sample a pandas dataframe I can use pandas.DataFrame.sample . 如果我想随机采样一个熊猫数据框，可以使用pandas.DataFrame.sample 。

Suppose I randomly sample 80% of the rows. 假设我随机抽取80％的行。 How do I automatically get the other 20% of the rows that were not picked? 如何自动获取未选择的其他20％的行？

Answer 1

As Lagerbaer explains, one can add a column with a unique index to the dataframe, or randomly shuffle the entire dataframe. 正如Lagerbaer解释的那样，可以向数据框添加一列具有唯一索引的列，或随机地对整个数据框进行随机排序。 For the latter, 对于后者，

df.reindex(np.random.permutation(df.index))

works. 作品。 (np means numpy) （np表示numpy）

Answer 2

>>> import pandas as pd, numpy as np
>>> df = pd.DataFrame({'a': [1,2,3,4,5,6,7,8,9,10], 'b': [11,12,13,14,15,16,17,18,19,20]})
>>> df
    a   b
0   1  11
1   2  12
2   3  13
3   4  14
4   5  15
5   6  16
6   7  17
7   8  18
8   9  19
9  10  20

# randomly sample 5 rows
>>> sample = df.sample(5)
>>> sample
   a   b
7  8  18
2  3  13
4  5  15
0  1  11
3  4  14

# list comprehension to get indices not in sample's indices
>>> idxs_not_in_sample = [idx for idx in df.index if idx not in sample.index]
>>> idxs_not_in_sample
[1, 5, 6, 8, 9]

# locate the rows at the indices in the original dataframe that aren't in the sample
>>> not_sample = df.loc[idxs_not_in_sample]
>>> not_sample
    a   b
1   2  12
5   6  16
6   7  17
8   9  19
9  10  20

大熊猫抽样

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-09-30 23:32:35

解决方案2
2 2016-09-30 23:40:05

大熊猫抽样

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-09-30 23:32:35

解决方案2 2 2016-09-30 23:40:05

解决方案1
3 已采纳 2016-09-30 23:32:35

解决方案2
2 2016-09-30 23:40:05