[英]remove rows by reference to column values in data.table r
I have a data.table with 47 variables looking at 5007 PhD student outcomes that looks something like this 我有一个data.table,其中包含47个变量,查看5007名博士生的成绩,看起来像这样
sample<-data.table(PHD_STUDENT_ID=c(101:120),STUDY_LOCATION=c("Sydney","Canberra","Sydney","Sydney",
"Malaysia", "Malaysia", "CLF", "DRR", "GHS", "HMS", "DRJD", "KLS", "Malaysia",
"Singapore", "Melbourne", "RD3S", "South Africa", "RME", "Sydney", "Canberra"),
GRADE=c(51:70))
So the data.table looks something like this 所以data.table看起来像这样
PHD_STUDENT_ID STUDY_LOCATION GRADE
1 101 Sydney 51
2 102 Canberra 52
3 103 Sydney 53
4 104 Sydney 54
5 105 Malaysia 55
6 106 Malaysia 56
7 107 CLF 57
8 108 DRR 58
.........
I need to retain all the rows except for the rows where the Study location is "Malaysia", "South Africa" or "Singapore". 我需要保留所有行,但研究位置为“马来西亚”,“南非”或“新加坡”的行除外。 So basically all the values that are not at the Campuses in those countries.
因此,基本上所有这些国家/地区都不具备的价值观。 I have hundreds of unique values where the study location is just a code for a lab eg "CLF" and "DRR" which I want to retain so I can't just subset by Australia cities.
我有数百个独特的值,其中学习位置只是一个实验室代码,例如“ CLF”和“ DRR”,我想保留这些代码,这样我就不能只是按澳大利亚城市划分子集。
Any advice on how to subset this data table by reference to the values in STUDY_LOCATION are not "Malaysia", "South Africa" or "Singapore" would be greatly appreciated. 不建议您参考“ Study_LOCATION”中的值来对数据表进行子集化的任何建议不是“马来西亚”,“南非”或“新加坡”。
你可以试试
sample[!STUDY_LOCATION %in% c('Malaysia', 'South Africa', 'Singapore')]
I assume you're learning data.table. 我假设您正在学习data.table。 Thus a data.table way is
因此,data.table的方式是
setkey(sample, STUDY_LOCATION)
sample[!c('Malaysia', 'South Africa', 'Singapore')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.