简体   繁体   English

R-如何在不使用sqldf的情况下从data.frame的多个列中删除行?

[英]R - How can I remove rows from multiple columns in a data.frame without using sqldf?

I was able to figure it out with sqldf, but I want to be able to achieve the same results in pure R. 我可以使用sqldf弄清楚,但我希望能够在纯R中获得相同的结果。

Data: 数据:

df <- read.table(header=T, text = "year1 year2 year3 year4 signup_date 
                 B      U      C         D      4/10/12 
                 C      D      B         U      2/12/12 
                 U      C      D         U      3/14/05 
                 B      NA     NA        NA     3/7/05 
                 NA     NA     NA        NA     8/3/08 
                 A      NA     NA        NA     4/6/07")

My sqldf query: 我的sqldf查询:

df <- sqldf("
SELECT *
FROM data
WHERE year1 NOT IN ('B','C','D','U')
AND year2 NOT IN ('B','C','D','U')
AND year3 NOT IN ('B','C','D','U')
AND year4 NOT IN ('B','C','D','U')
ORDER BY signup_date DESC")

Desired result: 所需结果:

    year1 year2 year3 year4 signup_date
                            8/3/08   
    A                       4/6/07 

Another option is to use the dplyr package: 另一个选择是使用dplyr软件包:

library(dplyr)
filterVars <- c("B","C","D","U")
df %>% 
  filter(!year1 %in% filterVars, !year2 %in% filterVars, !year3 %in% filterVars, !year4 %in% filterVars) %>%
  arrange(desc(signup_date))

Yields: 产量:

  year1 year2 year3 year4 signup_date
1  <NA>  <NA>  <NA>  <NA>      8/3/08
2     A  <NA>  <NA>  <NA>      4/6/07

Try 尝试

fvars <- c('B', 'C', 'D', 'U')
df2 <- df1[Reduce(`&`,lapply(df1[paste0('year',1:4)], 
           function(x) !x %in% fvars)),]
df2
#   year1 year2 year3 year4 signup_date
#5                              8/3/08
#6     A                        4/6/07

Or using data.table 或使用data.table

library(data.table)
nm1 <- grep('year', names(df1))
setDT(df1)[df1[, Reduce(`&`,lapply(.SD, function(x) !x %chin% 
        fvars)) , .SDcols=nm1]][order(-signup_date)]
#   year1 year2 year3 year4 signup_date
#1:                              8/3/08
#2:     A                        4/6/07

NOTE: It may be better to order the 'signup_date' after converting to 'Date' class. 注意:转换为“日期”类后,最好对“ signup_date”进行排序。 ie. 即。 as.Date(df1$signup_date, '%m/%d/%y')

data 数据

df1 <- structure(list(year1 = c("B", "C", "U", "B", "", "A"),
year2 = c("U", 
"D", "C", "", "", ""), year3 = c("C", "B", "D", "", "", ""), 
year4 = c("D", "U", "U", "", "", ""), signup_date = c("4/10/12", 
"2/12/12", "3/14/05", "3/7/05", "8/3/08", "4/6/07")),
.Names =   c("year1", 
"year2", "year3", "year4", "signup_date"), class = "data.frame", 
row.names = c(NA, -6L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM