R-基于2个变量对数据帧进行子集化（其中一个是随机数，以便对第一个变量进行采样）？

Question

I would like to create a subset of a large data frame. 我想创建一个大数据框的子集。 I would like to select one row with each value for column 1 "class", based on having the lowest number for column 2 "random number". 我想根据第2列“随机数”的最低编号，选择第1列“类”的每个值的一行。

For example, rows 1,2,and 3 all have the value 2 in column 1 and I would like to keep/subset row 3 as it has the lowest random number (3.446456). 例如，第1、2和3行在第1列中的值均为2，而我想保留/细分第3行，因为它具有最低的随机数（3.446456）。 For this sample I would like to subset rows 3,4,7,8,9,10,11. 对于此示例，我想对第3、4、7、8、9、10、11行进行子集化。

My dataset has over 10,000 rows, so is there a way of coding for this? 我的数据集有超过10,000行，那么有没有办法对此进行编码？ I'm using R studio. 我正在使用R studio。

Thanks very much, 非常感谢，

Class   Random_number   Score_1      Score_2         Score_3 
2       5.575475        0.78464      0.747847        0.6746464
2       7.738382        0.73273      0.747474        0.6734652
2       3.456456        0.78464      0.747847        0.6746464
3       6.939399        0.23363      0.123555        0.6476384
4       10.99993        0.66654      0.565757        0.6565633
4       6.894898        0.54295      0.825264        0.2357674 
4       5.575475        0.78464      0.747847        0.6746464
5       3.738382        0.73273      0.747474        0.6734652
6       3.456456        0.78464      0.747847        0.6746464
7       6.932119        0.23363      0.123555        0.6476384
7       17.11993        0.66654      0.565757        0.6565633
8       6.895898        0.54295      0.825264        0.2357674

Answer 1

Try ordering the data set by random number : 尝试按随机数排序数据集：

data<-data[order(data$Random_number),]

Then subset by taking out duplicate values of Class 然后通过取出Class的重复值来子集

data<-subset(data, !duplicated(Class))

R-基于2个变量对数据帧进行子集化（其中一个是随机数，以便对第一个变量进行采样）？

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-11-19 15:03:17

R-基于2个变量对数据帧进行子集化（其中一个是随机数，以便对第一个变量进行采样）？

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-11-19 15:03:17

解决方案1
1 已采纳 2013-11-19 15:03:17