I'd like to (approximately) partition a large number of records in a dataframe. This is easily achieved using sample
:
fraction = .7
df1 = df.sample(frac = fraction)
df2 = df.drop(df1.index)
Now for my question: Suppose I'd like to (randomly) partition the dataset but must also keep all records of a group
together. It can be assumed that only a few records belong to each group, so that it will not interfere with the ability to randomly partition. An example, noting that the group
only collects a few records and therefore does not impose significant constraint:
df =
group value
0 'aaa' 48
1 'aaa' -103
2 'aab' 20
3 'aac' 21
4 'aac' 40
...
10000 'zzf' 220
Can you try groupby and then sample the groups as:
grouped = df.groupby('group')
grouped.apply(lambda x: x.sample(frac=0.7)) –
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.