根据列值从数据框中随机选择行

Question

I have a pandas data frame as follows: 我有一个熊猫数据框，如下所示：

col1, col2, label
a    b      0
b    b ,    0
.
.
..........  0
..........  1

and the value_counts for the label column: 以及label列的value_counts ：

df['label'].value_counts():

0: 200000
1: 10000

I want to select 50000 rows from label with value '0' at random such that my value_counts become: 我想从带有值'0'的标签中随机选择50000行，这样我的value_counts变为：

0: 50000
1: 10000

Answer 1

Filter each value and sample N values from each. 过滤每个值并从每个值中sample N值。 Then, get their indexes, join through union and just loc 然后，让他们的指标，通过加入union ，只是loc

s0 = df.label[df.label.eq(0)].sample(50000).index
s1 = df.label[df.label.eq(1)].sample(10000).index 

df = df.loc[s0.union(s1)]

Of course, you don't need to specify the 10000 in the s1 if you're just getting all of them :) It's just there for illustration 当然，如果只获取所有这些，就无需在s1指定10000 ：）

根据列值从数据框中随机选择行

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-08 04:21:29

根据列值从数据框中随机选择行

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-08 04:21:29

解决方案1
1 已采纳 2019-08-08 04:21:29