简体   繁体   English

如何使用R将两个或几个变量转换(计算)为一个?

[英]How to transform (calculate) two or several variables into one using R?

I'm having some difficulties merging two or several variables in my data. 我在合并数据中的两个或几个变量时遇到一些困难。 I'm able to do it in Excel but can't figure out how to perform the same thing in R. 我可以在Excel中执行此操作,但无法弄清楚如何在R中执行相同的操作。

Basically I want to create two combined variables using the variables below: 基本上,我想使用以下变量创建两个组合变量:

Data1: creating variable CombineA+B 数据1:创建变量CombineA + B

country  year       A1         B1        **combineA1+B1**
USA      2002       0          0            0
USA      2003       1          1            2
USA      2004       NA         1            1
USA      2005       0          0            0
USA      2006       0          1            1
USA      2007       0          0            0
USA      2008       0          1            1
USA      2009       NA         NA           NA
USA      2010       0          1            1
USA      2011       NA         0            0
USA      2012       0          1            1
USA      2013       0          0            0
USA      2014       0          1            1

Creating the variable "combineA1+B1" seems simple, all I need to do is add those two (A1 and B1). 创建变量“ combineA1 + B1”似乎很简单,我要做的就是添加这两个变量(A1和B1)。 In Excel this is very simple and I guess it is in R as well. 在Excel中,这非常简单,我猜它也在R中。 However, NA values create problems when adding those two variables. 但是,NA值在添加这两个变量时会产生问题。 So, how to create a combineA1+B1 variable like the one above? 那么, 如何创建类似于上面的一个CombineA1 + B1变量?

If both A1 and B1 have NA's, then the combineA1+B1 should also have NA. 如果A1和B1都具有NA,则CombineA1 + B1也应具有NA。 If one has NA value and the other has 1 or 0 value, it should give the respective number (see for ex USA 2004). 如果一个具有NA值,另一个具有1或0值,则应给出相应的数字(请参见ex USA 2004)。

I'd also like to create another combine variable: "combineA1+B1+C1+D1" 我还想创建另一个合并变量:“ combineA1 + B1 + C1 + D1”

Data 2: creating variable "combineA1+B1+C1+D1" 数据2:创建变量“ combineA1 + B1 + C1 + D1”

country year    A1  B1  C1  D1  combineABCD
USA     2002    0   0   0   0   0
USA     2003    1   1   0   0   2
USA     2004    NA  1   0   0   1
USA     2005    0   0   0   0   0
USA     2006    0   1   0   0   1
USA     2007    0   0   0   0   0
USA     2008    0   1   1   0   2
USA     2009    NA  NA  NA  NA  NA
USA     2010    0   1   1   0   2
USA     2011    NA  0   0   0   0
USA     2012    0   1   1   0   2
USA     2013    0   0   0   0   0
USA     2014    0   1   1   0   2

I guess that once I know how to create the first combine variable I'll be able to do this as well. 我猜想,一旦我知道如何创建第一个合并变量,我也将能够做到这一点。 Although I'm not sure how all these NA's can be handled? 尽管我不确定如何处理所有这些NA?

Grateful for all suggestions you can come up with to add these variable properly. 感谢您提出的所有建议以正确添加这些变量。

With a little bit of searching, I found this article . 经过一点搜索,我找到了这篇文章 I take no credit for this code. 我不相信这个代码。

mysum <- function(x) if (all(is.na(x))) NA else sum(x, na.rm=T) 
df$combinedA1B1 <- apply(df[, c("A1", "B1")], 1, mysum)

df
#    country year A1 B1 combinedA1B1
# 1      USA 2002  0  0            0
# 2      USA 2003  1  1            2
# 3      USA 2004 NA  1            1
# 4      USA 2005  0  0            0
# 5      USA 2006  0  1            1
# 6      USA 2007  0  0            0
# 7      USA 2008  0  1            1
# 8      USA 2009 NA NA           NA
# 9      USA 2010  0  1            1
# 10     USA 2011 NA  0            0
# 11     USA 2012  0  1            1
# 12     USA 2013  0  0            0
# 13     USA 2014  0  1            1

To get R to drop NAs instead of propagating them through your calculation, many functions have an optional na.rm argument. 为了使R丢弃NA,而不是通过计算传播它们,许多函数都有一个可选的na.rm参数。 It defaults to FALSE , but setting it to TRUE causes R to ignore NAs in your calculations: 它的默认值为FALSE ,但将其设置为TRUE会使R在计算中忽略NAs:

> sum(1, NA)
[1] NA

> sum(1, NA, na.rm = TRUE)
[1] 1

However, passing this argument can cause tricky behavior when all of your arguments are NA , as R is still determined to ignore them: 但是,当您的所有参数均为NA时,传递此参数可能会导致棘手的行为,因为R仍然决定忽略它们:

> sum(NA, NA, na.rm = TRUE)
[1] 0

To get the kind of NA handling you want, you can define your own function: 要获得所需的NA处理类型,可以定义自己的函数:

my.sum <- function(...) {
    if(all(is.na(c(...)))) {
        return(NA)
    } else {
        return(sum(..., na.rm = TRUE))
    }
}

Once you've done that, you can zip your two columns together using mapply , like so: 完成此操作后,您可以使用mapply将两列压缩在一起,如下所示:

data1$combine <- mapply(data1$A1, data1$B1, FUN = my.sum)

You may not have encountered ... yet for defining functions - its purpose is to take an arbitrary number of optional arguments and hold them to "pass on", in this case to c and sum . 您可能尚未遇到...定义函数-它的目的是采用任意数量的可选参数并将其保留为“传递”,在这种情况下为csum

Here is one with dplyr package: 这是一个带有dplyr软件包的软件包:

df <- data.frame(country = rep("USA", 13),
             year = 2002:2014,
             A1 = c(0,1,NA,0,0,0,0,NA,0,NA,0,0,0),
             B1 = c(0,1,1,0,1,0,1,NA,1,0,1,0,1)
             n)

df <- df %>% mutate(combine = ifelse(is.na(A1), B1,
                           ifelse(is.na(B1), A1, A1 + B1)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM