[英]Keeping rows for identical columns in a data.table in R
我有一张桌子,看起来像这样:
DT <-data.table(ID=c(1:4),AREA=c("a","b","c","d"),PARTNER=c("f","b","g","d"),OBS_VALUE=c(10,5,13,0))
结果,我想获取面积和PARTNER相等且OBS_VALUE不等于0或NA的记录。
如果两列相同,则相同的函数将全局测试。
setkeyv(DT,c("AREA","PARTNER"))
identical(DT['AREA'],DT['PARTNER'])
结果显然是假的。
我不知道如何到达目标。 感谢您的帮助。
收到的答案
DT[REF_AREA==COUNTERPART_AREA & !is.na(OBS_VALUE) & OBS_VALUE!=0]
给出错误信息:
Error in Ops.factor(REF_AREA, COUNTERPART_AREA) :
level sets of factors are different
确实,我的数据表更复杂:
dput(head(diagonal))
structure(list(TIME_PERIOD = c(2010L, 2010L, 2010L, 2010L, 2010L,
2010L), REF_AREA = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("AT",
"BE", "BG", "CY", "CZ", "DE", "DK", "EE", "ES", "FI", "FR", "GB",
"GR", "HR", "HU", "IE", "IT", "LT", "LU", "LV", "MT", "NL", "PL",
"PT", "RO", "SE", "SI", "SK"), class = "factor"), COUNTERPART_AREA = structure(c(20L, 20L, 20L, 20L, 20L, 20L), .Label = c("4A", "4F", "4S", "9A",
"A1", "A2", "A5", ... "AT", ..., "W1"), class = "factor"),
, .Names = c("TIME_PERIOD", "REF_AREA", "COUNTERPART_AREA", "UNIT_MEASURE", "INT_ACC_ITEM", "ACCOUNTING_ENTRY", "OBS_VALUE", "OBS_COMMENT", "DECIMALS", "UNIT_MULT", "i.CONF_STATUS", "i.OBS_STATUS"), sorted = c("REF_AREA", "COUNTERPART_AREA"), class = c("data.table", "data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x07d424a0>)
任何想法?
DT[AREA==PARTNER & !is.na(OBS_VALUE) & OBS_VALUE!=0]
使用dplyr
:
library(dplyr)
DT <-data.frame(ID=c(1:4),AREA=c("a","b","c","d"),PARTNER=c("f","b","g","d"),OBS_VALUE=c(10,5,13,0), stringsAsFactors = FALSE)
DT %>%
filter(AREA == PARTNER & OBS_VALUE != 0 & !is.na(OBS_VALUE))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.