简体   繁体   English

将特定的列和行汇总到R中的新矩阵

[英]Summing specific columns and rows to new matrix in R

I'm a newer user of R and understand how to make my code work but I know there has to be a dplyr or purrr function that does this more efficiently and with a lot less code? 我是R的新用户,并且了解如何使我的代码正常工作,但是我知道必须有一个dplyrpurrr函数,该函数可以更有效地使用更少的代码来执行此操作? If there is I haven't found it yet. 如果有,我还没有找到。 My PI wants a summation of our race data but the trick is to have it separated by one race and then if they answered more than one race the sum breakdown of those. 我的PI希望对我们的比赛数据进行汇总,但诀窍是将它们分开进行一场比赛,然后,如果他们回答了不止一场比赛,则将其总和细分。 I did a subset of the data to get just those columns and then added the columns individually in each row and output that to a new matrix 7x7 to get sums of each. 我对数据进行了一个子集处理,以获取仅这些列,然后将这些列分别添加到每一行中,然后将其输出到新的7x7矩阵中以获取每一列的总和。

This is my code. 这是我的代码。 My question is there a much more efficient way of doing this? 我的问题是有更有效的方法吗?

-sum races to create totaled matrix of all races 总和种族以创建所有种族的总计矩阵

subset <- subset(dataset[,11:17])
test <- matrix(,nrow=7, ncol=7)

colnames(test) <- c("African_American", "Asian", "Hawaiian_Pacific", "Native_Alaskan", "White_Euro", "Hispanic_Latino", "No-Answer")

rownames(test) <- c("African_American", "Asian", "Hawaiian_Pacific", "Native_Alaskan", "White_Euro", "Hispanic_Latino", "No-Answer")

-basic design of "if ==1 then strictly one race. If >1 stick in appropriate category 基本设计为“如果== 1,则严格参加一场比赛。如果> 1,则在相应类别中停留

test[1,1] <- sum(subset$African_American==1, na.rm=TRUE)

test[1,2] <- sum(subset$African_American+subset$Asian>1, na.rm=TRUE)

test[1,3] <- sum(subset$African_American+subset$Hawaiian_Pacific>1, na.rm=TRUE)

test[1,4] <- sum(subset$African_American+subset$Native_Alaskan>1, na.rm=TRUE)

test[1,5] <- sum(subset$African_American+subset$White_Euro>1, na.rm=TRUE)

test[1,6] <- sum(subset$African_American+subset$Hispanic_Latino>1, na.rm=TRUE)

test[1,7] <- sum(subset$African_American+subset$`No-Answer`>1, na.rm=TRUE)

test[2,1] <- sum(subset$Asian+subset$African_American>1, na.rm=TRUE)

test[2,2] <- sum(subset$Asian==1, na.rm=TRUE)...

There are seven columns to add to each other so it moves all the way through the matrix and outputs something similar to this where the diagonal are actual counts of only one race and the others are multiple occurrences: matrix 有七列添加到对方所以它会将所有的方式通过矩阵输出类似这种对角只有一个种族的实际数,其余的是多次出现: 矩阵

I found a way which is not using plyr but the r-base function apply. 我发现了一种不使用plyr但适用r-base函数的方法。

data = data.frame(set1 = round(runif(n = 10,min = 0,max = 1)),
              set2 = round(runif(n = 10,min = 0,max = 1)),
              set3 = round(runif(n = 10,min = 0,max = 1)),
              set4 = round(runif(n = 10,min = 0,max = 1)),
              set5 = round(runif(n = 10,min = 0,max = 1)),
              set6 = round(runif(n = 10,min = 0,max = 1)),
              set7 = round(runif(n = 10,min = 0,max = 1))
)
res = apply(combn(1:ncol(data), 2), 2, function(x) sum(data[, x[1]] & data[, x[2]]))
test <- matrix(0,nrow=7, ncol=7)
test[upper.tri(test)] = res
> test
 [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    5    3    2    2    4    2    2
[2,]    0    5    5    3    4    5    4
[3,]    0    0    6    3    1    0    5
[4,]    0    0    0    8    3    3    1
[5,]    0    0    0    0    2    2    2
[6,]    0    0    0    0    0    6    3
[7,]    0    0    0    0    0    0    6

The first part is producing some test data. 第一部分是生成一些测试数据。 combn(1:ncol(data), 2) is telling apply to use a function for each combination of 2 columns. combn(1:ncol(data),2)告诉应用对每个2列组合使用一个函数。 The & function then is returning TRUE for all entries of data[, x[1]] and data[, x[2]] (the 2 selected comlumns) where both values are 1. The summation is counting these. 然后,对于两个值均为1的data [,x [1]]和data [,x [2]](两个选定的列)的所有条目,&函数返回TRUE。 As a return you get the desired values. 作为回报,您将获得所需的值。 The following two lines construct a matrix as you wanted. 以下两行根据需要构造一个矩阵。 Please note that with addition of 请注意,除了

res2 = apply(combn(1:ncol(data), 1), 2, function(x) sum(data[, x[1]]))
test[cbind(1:7,1:7)] <- res2

ou can also set the diagonal to the correct counts. 您也可以将对角线设置为正确的计数。 Anyway this is only working for objects having answered 1 in 2 columns. 无论如何,这仅适用于在2列中回答1个的对象。 It wont find those who are Asian, Hispanic and American. 它不会找到亚洲,西班牙裔和美国人。 But you can compute this with a slight change to combination of 3 columns : 但是您可以通过稍微更改3列的组合来计算:

apply(combn(1:ncol(data), 3), 2, function(x) sum(data[, x[1]] & data[, x[2]] & data[, x[3]]))

Please also note that my random data may not be representative/unrealistic. 另请注意,我的随机数据可能不具有代表性/不切实际。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM