简体   繁体   English

将值出现的按行计数放入新变量中,如何使用dplyr在R中做到这一点?

[英]Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?

I have a large dataframe (df) that looks like this: 我有一个看起来像这样的大数据框(df):

structure(list(var1 = c(1, 2, 3, 4, 2, 3, 4, 3, 2), var2 = c(2, 
3, 4, 1, 2, 1, 1, 1, 3), var3 = c(4, 4, 2, 3, 3, 1, 1, 1, 4), 
    var4 = c(2, 2, 2, 2, 3, 2, 3, 4, 1), var5 = c(4, 4, 2, 3, 
    3, 1, 1, 1, 4)), .Names = c("var1", "var2", "var3", "var4", 
"var5"), row.names = c(NA, -9L), class = "data.frame")

  var1 var2 var3 var4 var5
1    1    2    4    2    4
2    2    3    4    2    4
3    3    4    2    2    2
4    4    1    3    2    3
5    2    2    3    3    3
6    3    1    1    2    1
7    4    1    1    3    1
8    3    1    1    4    1
9    2    3    4    1    4

Now I need to count the occurence of values rowwise and make new variables of the counts. 现在,我需要按行计数值的出现并为计数创建新的变量。 This should be the result: 结果应该是:

  var1 var2 var3 var4 var5 n_1 n_2 n_3 n_4
1    1    2    4    2    4   1   2   0   2
2    2    3    4    2    4   0   2   1   2
3    3    4    2    2    2   0   3   1   1
4    4    1    3    2    3   1   1   2   1
5    2    2    3    3    3   0   2   3   0
6    3    1    1    2    1   3   1   1   0
7    4    1    1    3    1   3   0   1   1
8    3    1    1    4    1   3   0   1   1
9    2    3    4    1    4   1   1   1   2

As you can see variable n_1 shows the rowcounts of the 1's, n_2 the row counts of the 2's, etc. 如您所见,变量n_1显示1的行计数,n_2显示2的行计数,依此类推。

I tried some dplyr functions (because I like their speed), but haven't succeeded yet. 我尝试了一些dplyr函数(因为我喜欢它们的速度),但尚未成功。 I know this is definately ugly code :-), but my approache would be something in this way: 我知道这绝对是丑陋的代码:-),但是我的方法是这样的:

newdf <- mutate(rowwise(df, n_1 = sum(df==1))

Does anyone have an idea about how to deal with this problem? 有谁知道如何处理这个问题? Many thanks in advance! 提前谢谢了!

This uses rowwise() and do() from dplyr but it's definitely ugly. 它使用rowwise()do()dplyr但它肯定难看。

Not sure if there is something that can modify from this so that you get a data.frame output directly as shown over @ https://github.com/hadley/dplyr/releases . 不知道是否可以对此进行修改,以便直接获得data.frame输出,如@ https://github.com/hadley/dplyr/releases所示。

interim_res <- df %>% 
                  rowwise() %>% 
                  do(out = sapply(min(df):max(df), function(i) sum(i==.)))

interim_res <- interim_res[[1]] %>% do.call(rbind,.) %>% as.data.frame(.)

Then to get intended result: 然后得到预期的结果:

res <- cbind(df,interim_res)

This is a solution using base functions 这是使用基本功能的解决方案

dd <- t(apply(df, 1, function(x) table(factor(x, levels=1:4))))
colnames(dd) <- paste("n",1:4, sep="_")
cbind(df, dd)

Just use the table command across rows of your data.frame to get counts of each value from 1-4. 只需在data.frame各行中使用table命令即可获取1-4中每个值的计数。

Here is an approach using qdapTools package: 这是使用qdapTools软件包的一种方法:

library(qdapTools)

data.frame(dat, setNames(mtabulate(split(dat, id(dat))), paste0("n_", 1:4)))

##   var1 var2 var3 var4 var5 n_1 n_2 n_3 n_4
## 1    1    2    4    2    4   1   2   0   2
## 2    2    3    4    2    4   0   2   1   2
## 3    3    4    2    2    2   0   3   1   1
## 4    4    1    3    2    3   1   1   2   1
## 5    2    2    3    3    3   0   2   3   0
## 6    3    1    1    2    1   3   1   1   0
## 7    4    1    1    3    1   3   0   1   1
## 8    3    1    1    4    1   3   0   1   1
## 9    2    3    4    1    4   1   1   1   2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM