I would like to create a subset of a large data frame. I would like to select one row with each value for column 1 "class", based on having the lowest number for column 2 "random number".
For example, rows 1,2,and 3 all have the value 2 in column 1 and I would like to keep/subset row 3 as it has the lowest random number (3.446456). For this sample I would like to subset rows 3,4,7,8,9,10,11.
My dataset has over 10,000 rows, so is there a way of coding for this? I'm using R studio.
Thanks very much,
Class Random_number Score_1 Score_2 Score_3 2 5.575475 0.78464 0.747847 0.6746464 2 7.738382 0.73273 0.747474 0.6734652 2 3.456456 0.78464 0.747847 0.6746464 3 6.939399 0.23363 0.123555 0.6476384 4 10.99993 0.66654 0.565757 0.6565633 4 6.894898 0.54295 0.825264 0.2357674 4 5.575475 0.78464 0.747847 0.6746464 5 3.738382 0.73273 0.747474 0.6734652 6 3.456456 0.78464 0.747847 0.6746464 7 6.932119 0.23363 0.123555 0.6476384 7 17.11993 0.66654 0.565757 0.6565633 8 6.895898 0.54295 0.825264 0.2357674
Try ordering the data set by random number :
data<-data[order(data$Random_number),]
Then subset by taking out duplicate values of Class
data<-subset(data, !duplicated(Class))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.