[英]Data.frame filtering
I have the following data.frame df
: 我有以下data.frame
df
:
df = data.frame(col1 = c('a','a','a','a','a','b','b','c','d'),
col2 = c('a','a','a','b','b','b','b','a','a'),
height1 = c(NA,32,NA,NA,NA,NA,NA,25,NA),
height2 = c(31,31.5,NA,NA,11,12,13,NA,NA),
col3 = 1:9)
# col1 col2 height1 height2 col3
#1 a a NA 31.0 1
#2 a a 32 31.5 2
#3 a a NA NA 3
#4 a b NA NA 4
#5 a b NA 11.0 5
#6 b b NA 12.0 6
#7 b b NA 13.0 7
#8 c a 25 NA 8
#9 d a NA NA 9
I want for each couple of value in col1, col2
to build a column height
containing values such that: 我希望
col1, col2
每个值都能构建一个包含以下值的列height
:
NA
in height1
and height2
, return NA
. height1
和height2
中只有NA
,则返回NA
。 height1
, take this value. height1
有值,请取此值。 (for a couple col1, col2
, there is at most one non NA
value in column height1
) col1, col2
,列height1
1中至多有一个non NA
值) NA
in height1
and some non NA
values in height2
, take the first value in height2
. NA
在height1
和一些non NA
价值观height2
,采取的第一个值height2
。 I need also to keep corresponding values in column col3
. 我还需要在列
col3
保留相应的值。
The new data.frame
new.df
will look like: 新的
data.frame
new.df
将如下所示:
# col1 col2 height col3
#1 a a 32 2
#2 a b 11 5
#3 b b 12 6
#4 c a 25 8
#5 d a NA 9
I would prefer a data.frame
approach, quite concise, but I realize I am unable to find one! 我更喜欢
data.frame
方法,非常简洁,但我意识到我无法找到一个!
Maybe not the elegant solution you are looking for but here is a base R
option: 也许不是您正在寻找的优雅解决方案,但这里是一个
base R
选项:
do.call("rbind",
lapply(split(df,paste0(df$col1,df$col2)),
function(tab) {
colnames(tab)[3:4] <- "height"
out <- if(any(!is.na(tab[, 3]))) {
tab[which(!is.na(tab[,3])),-4]
} else {
if (any(!is.na(tab[,4]))) {
tab[which(!is.na(tab[,4]))[1],c(1:2,4:5)]
} else {
tab[1,-4]
}
}
return(out)
}
)
)
# col1 col2 height col3
# aa a a 32 2
# ab a b 11 5
# bb b b 12 6
# ca c a 25 8
# da d a NA 9
With dplyr: 使用dplyr:
df %>%
mutate(
order = ifelse(!is.na(height1), 1, ifelse(!is.na(height2), 2, 3)),
height = ifelse(!is.na(height1), height1, ifelse(!is.na(height2), height2, NA))
) %>%
arrange( col1, col2, order, height) %>%
distinct(col1, col2) %>%
select( col1, col2, height, col3)
I use data.table
(whereas I would like to use data.frame option exceptionaly there) and I find my solution unelegant: 我使用
data.table
(而我想在那里使用data.frame选项异常)并且我发现我的解决方案不优雅:
func = function(df)
{
if(all(is.na(subset(df, select=c(height1,height2)))))
return(df[1,])
if(any(!is.na(df$height1)))
return(df[!is.na(df$height1),])
df[!is.na(df$height2),][1,]
}
setDT(df)
new.df=df[,func(.SD),by=list(col1,col2)]
new.df = data.frame(new.df)
new.df$height = ifelse(is.na(new.df$height1), new.df$height2, new.df$height1)
#> new.df
# col1 col2 height1 height2 col3 height
#1 a a 32 31.5 2 32
#2 a b NA 11.0 5 11
#3 b b NA 12.0 6 12
#4 c a 25 NA 8 25
#5 d a NA NA 9 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.