简体   繁体   中英

How do you combine columns with conditions in R?

I have a data frame that looks somewhat like this in R:

D = data.frame(countrycode = c(2, 2, 2, 3, 3, 3), 
      year = c(1980, 1991, 2013, 1980, 1991, 2013), 
      pop90 = c(1, 1, 1, 2, 2, 2), 
      pop00 = c(3, 3, 3, 4, 4, 4), 
      pop10 = c(5, 5, 5, 6, 6, 6))

desired output:

Res = data.frame(countrycode = c(2, 2, 2, 3, 3, 3),
       year = c(1980, 1991, 2013, 1980, 1991, 2013),
       popcombined = c(1, 3, 5, 2, 4, 6))

I would like to combine pop90, pop00 and pop10 into one column where years 1980-1990 would reflect the value of pop90, years 1991-2000 would reflect the value of pop00 and years 2001-2013 would reflect the value of pop10. How can I do this? I have tried the merge function but I could not set the years in place to reflect the conditions I set out above.

You can use row/col indexing

popcombined <- D[3:5][cbind(1:nrow(D),findInterval(D$year, 
             c(-Inf, 1990, 2000, Inf)))]

cbind(D[1:2], popcombined)
#    countrycode year popcombined
#1           2 1980           1
#2           2 1991           3
#3           2 2013           5
#4           3 1980           2
#5           3 1991           4
#6           3 2013           6

You can use cut and do something like:

library(plyr)

adply(D, 1, function(u){
    transform(u[,1:2], 
              pop = cut(u$year, c(1980, 1990, 2000, 2013), label=tail(unlist(u),3),include.lowest=T))
})

I set all unwanted data to NA and melt ed from package reshape2 :

## Set NA's for every decade

library(Hmisc)

D[D$year %nin% 1980:1989,]$pop90 <- NA

D[D$year %nin% 1990:1999,]$pop00 <- NA

D[D$year %nin% 2000:2013,]$pop10 <- NA

# Melt data.frame
library(reshape2)

D.new <- melt(D, id.vars = c("countrycode", "year"),
                 value.name = "popcombined")

# Some minor stuff
D.new <- na.omit(D.new)

D.new <- D.new[,-3]

D.new <- arrange(D.new, countrycode)

# Check my data against your result
> D.new == Res
     countrycode year popcombined
[1,]        TRUE TRUE        TRUE
[2,]        TRUE TRUE        TRUE
[3,]        TRUE TRUE        TRUE
[4,]        TRUE TRUE        TRUE
[5,]        TRUE TRUE        TRUE
[6,]        TRUE TRUE        TRUE

Using basic indexing:

D[D$year>=1980 & D$year<1990 , "popcombined" ] <- D[D$year>=1980 & D$year<1990, "pop90" ]
D[D$year>=1990 & D$year<2000 , "popcombined" ] <- D[D$year>=1990 & D$year<2000, "pop00" ]
D[D$year>=2000  , "popcombined" ] <- D[D$year>=2000 , "pop10" ]

Using with :

D$popcombined2 <-NA
D$popcombined2 <- with(D, ifelse( year>=1980 & year<1990, pop90, popcombined2 ))
D$popcombined2 <- with(D, ifelse( year>=1990 & year<2000, pop00, popcombined2 ))
D$popcombined2 <- with(D, ifelse( year>=2000 , pop10, popcombined2 ))

#> D
#  countrycode year pop90 pop00 pop10 popcombined popcombined2
#1           2 1980     1     3     5           1            1
#2           2 1991     1     3     5           3            3
#3           2 2013     1     3     5           5            5
#4           3 1980     2     4     6           2            2
#5           3 1991     2     4     6           4            4
#6           3 2013     2     4     6           6            6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM