[英]Randomly selecting rows from a dataframe based on a column value
I have a pandas data frame as follows: 我有一个熊猫数据框,如下所示:
col1, col2, label
a b 0
b b , 0
.
.
.......... 0
.......... 1
and the value_counts
for the label column: 以及label列的
value_counts
:
df['label'].value_counts():
0: 200000
1: 10000
I want to select 50000 rows from label with value '0' at random such that my value_counts become: 我想从带有值'0'的标签中随机选择50000行,这样我的value_counts变为:
0: 50000
1: 10000
Filter each value and sample
N
values from each. 过滤每个值并从每个值中
sample
N
值。 Then, get their indexes, join through union
and just loc
然后,让他们的指标,通过加入
union
,只是loc
s0 = df.label[df.label.eq(0)].sample(50000).index
s1 = df.label[df.label.eq(1)].sample(10000).index
df = df.loc[s0.union(s1)]
Of course, you don't need to specify the 10000
in the s1
if you're just getting all of them :) It's just there for illustration 当然,如果只获取所有这些,就无需在
s1
指定10000
:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.