[英]How do you combine columns with conditions in R?
I have a data frame that looks somewhat like this in R: 我在R中有一个看起来像这样的数据框:
D = data.frame(countrycode = c(2, 2, 2, 3, 3, 3),
year = c(1980, 1991, 2013, 1980, 1991, 2013),
pop90 = c(1, 1, 1, 2, 2, 2),
pop00 = c(3, 3, 3, 4, 4, 4),
pop10 = c(5, 5, 5, 6, 6, 6))
desired output: 所需的输出:
Res = data.frame(countrycode = c(2, 2, 2, 3, 3, 3),
year = c(1980, 1991, 2013, 1980, 1991, 2013),
popcombined = c(1, 3, 5, 2, 4, 6))
I would like to combine pop90, pop00 and pop10 into one column where years 1980-1990 would reflect the value of pop90, years 1991-2000 would reflect the value of pop00 and years 2001-2013 would reflect the value of pop10. 我想将pop90,pop00和pop10合并为一列,其中1980-1990年将反映pop90的值,1991-2000年将反映pop00的值,而2001-2013年将反映pop10的值。 How can I do this? 我怎样才能做到这一点? I have tried the merge function but I could not set the years in place to reflect the conditions I set out above. 我已经尝试过合并功能,但是无法设置年份来反映我上面列出的条件。
You can use row/col
indexing 您可以使用row/col
索引
popcombined <- D[3:5][cbind(1:nrow(D),findInterval(D$year,
c(-Inf, 1990, 2000, Inf)))]
cbind(D[1:2], popcombined)
# countrycode year popcombined
#1 2 1980 1
#2 2 1991 3
#3 2 2013 5
#4 3 1980 2
#5 3 1991 4
#6 3 2013 6
You can use cut
and do something like: 您可以使用cut
并执行以下操作:
library(plyr)
adply(D, 1, function(u){
transform(u[,1:2],
pop = cut(u$year, c(1980, 1990, 2000, 2013), label=tail(unlist(u),3),include.lowest=T))
})
I set all unwanted data to NA
and melt
ed from package reshape2
: 我所有不需要的数据设定为NA
和melt
从包编reshape2
:
## Set NA's for every decade
library(Hmisc)
D[D$year %nin% 1980:1989,]$pop90 <- NA
D[D$year %nin% 1990:1999,]$pop00 <- NA
D[D$year %nin% 2000:2013,]$pop10 <- NA
# Melt data.frame
library(reshape2)
D.new <- melt(D, id.vars = c("countrycode", "year"),
value.name = "popcombined")
# Some minor stuff
D.new <- na.omit(D.new)
D.new <- D.new[,-3]
D.new <- arrange(D.new, countrycode)
# Check my data against your result
> D.new == Res
countrycode year popcombined
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] TRUE TRUE TRUE
[6,] TRUE TRUE TRUE
Using basic indexing: 使用基本索引:
D[D$year>=1980 & D$year<1990 , "popcombined" ] <- D[D$year>=1980 & D$year<1990, "pop90" ]
D[D$year>=1990 & D$year<2000 , "popcombined" ] <- D[D$year>=1990 & D$year<2000, "pop00" ]
D[D$year>=2000 , "popcombined" ] <- D[D$year>=2000 , "pop10" ]
Using with
: with
使用:
D$popcombined2 <-NA
D$popcombined2 <- with(D, ifelse( year>=1980 & year<1990, pop90, popcombined2 ))
D$popcombined2 <- with(D, ifelse( year>=1990 & year<2000, pop00, popcombined2 ))
D$popcombined2 <- with(D, ifelse( year>=2000 , pop10, popcombined2 ))
#> D
# countrycode year pop90 pop00 pop10 popcombined popcombined2
#1 2 1980 1 3 5 1 1
#2 2 1991 1 3 5 3 3
#3 2 2013 1 3 5 5 5
#4 3 1980 2 4 6 2 2
#5 3 1991 2 4 6 4 4
#6 3 2013 2 4 6 6 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.