[英]sample pandas, keeping collected records together
I'd like to (approximately) partition a large number of records in a dataframe. 我想(大约)在一个数据框中划分大量的记录。 This is easily achieved using
sample
: 使用
sample
可以轻松实现:
fraction = .7
df1 = df.sample(frac = fraction)
df2 = df.drop(df1.index)
Now for my question: Suppose I'd like to (randomly) partition the dataset but must also keep all records of a group
together. 现在,我的问题是:假设我想(随机)对数据集进行分区,但还必须将
group
所有记录保持在一起。 It can be assumed that only a few records belong to each group, so that it will not interfere with the ability to randomly partition. 可以假设每个组只有几个记录,因此它不会干扰随机分区的能力。 An example, noting that the
group
only collects a few records and therefore does not impose significant constraint: 举例说明,该
group
仅收集了几条记录,因此没有施加明显的约束:
df =
group value
0 'aaa' 48
1 'aaa' -103
2 'aab' 20
3 'aac' 21
4 'aac' 40
...
10000 'zzf' 220
Can you try groupby and then sample the groups as: 您可以尝试groupby,然后按以下方式对组进行采样:
grouped = df.groupby('group')
grouped.apply(lambda x: x.sample(frac=0.7)) –
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.