对大熊猫进行采样，将收集的记录保持在一起

Question

I'd like to (approximately) partition a large number of records in a dataframe. 我想（大约）在一个数据框中划分大量的记录。 This is easily achieved using sample : 使用sample可以轻松实现：

fraction = .7
df1     = df.sample(frac = fraction)
df2     = df.drop(df1.index)

Now for my question: Suppose I'd like to (randomly) partition the dataset but must also keep all records of a group together. 现在，我的问题是：假设我想（随机）对数据集进行分区，但还必须将group所有记录保持在一起。 It can be assumed that only a few records belong to each group, so that it will not interfere with the ability to randomly partition. 可以假设每个组只有几个记录，因此它不会干扰随机分区的能力。 An example, noting that the group only collects a few records and therefore does not impose significant constraint: 举例说明，该group仅收集了几条记录，因此没有施加明显的约束：

df = 
      group   value
0     'aaa'    48
1     'aaa'   -103
2     'aab'    20     
3     'aac'    21
4     'aac'    40
...
10000 'zzf'    220

Answer 1

Can you try groupby and then sample the groups as: 您可以尝试groupby，然后按以下方式对组进行采样：

grouped = df.groupby('group') 
grouped.apply(lambda x: x.sample(frac=0.7)) –

对大熊猫进行采样，将收集的记录保持在一起

问题描述

1 个解决方案

解决方案1
0 2017-12-15 10:14:18

对大熊猫进行采样，将收集的记录保持在一起

问题描述

1 个解决方案

解决方案1 0 2017-12-15 10:14:18

解决方案1
0 2017-12-15 10:14:18