简体   繁体   English

R:根据另一列中的值计算一列中的值的数量

[英]R: Count number of values from one column according to values in another column

I have a bit of an unclear question, so I hope I can explain this properly. 我有一个不明确的问题,所以我希望我能正确解释这个问题。 I am using R. I know for loops can be slow in R, but for me it would be ok to use a for loop in this case. 我正在使用R.我知道循环在R中可能很慢,但对我来说在这种情况下使用for循环是可以的。

I have a dataframe like this: 我有这样的数据帧:

    id_A    id_B    id_C    calc_A  calc_B  calc_C  
1   x,z     d       g,f        1        1       5
2   x,y,z   d,e     f          1        2       8
3   y,z     d,e     g          6        7       1

I also have a vector with the names c('A', 'B', 'C', etc.) What I want to do is to count for every row, how many id 's have a calc <= 2. id_A is linked to calc_A , etc. 我还有一个名为c('A', 'B', 'C', etc.)的向量c('A', 'B', 'C', etc.)我想要做的是计算每一行,有多少id有一个calc <= id_Acalc_A等相关联

For example, for the first row A and B have calc values <= 2, together A and B have 3 id 's. 例如,对于第一行A和B具有calc值<= 2,A和B一起具有3个id So the output will be something like this: 所以输出将是这样的:

   count
1   3
2   5
3   1

It's a bit messy, but this should do the trick (for data.frame d ): 它有点乱,但这应该可以解决问题(对于data.frame d ):

# store indices of calc columns and id columns
calc.cols <- grep('^calc', names(d))
id.cols <- grep('^id', names(d))

sapply(split(d, seq_len(nrow(d))), function(x) {
  length(unique(unlist(strsplit(paste(x[, id.cols][which(x[, calc.cols] <= 2)], 
                                      collapse=','), ','))))
})

# 1 2 3 
# 3 5 1

Assuming that the ID columns and the calc columns are in the same order 假设ID列和calc列的顺序相同

 library(stringr)
 indx <- sapply(df[,1:3], str_count, ",")+1
 indx[df[,4:6] >2] <- NA
 df$count <- rowSums(indx,na.rm=TRUE)
 df
 #   id_A id_B id_C calc_A calc_B calc_C count
 #1   x,z    d  g,f      1      1      5     3
 #2 x,y,z  d,e    f      1      2      8     5
 #3   y,z  d,e    g      6      7      1     1

Update 更新

Suppose, your dataset is not in the same order 假设您的数据集的顺序不同

 set.seed(42)
 df1 <- df[,sample(6)]
 library(gtools)
 df2 <-df1[,mixedorder(names(df1))]
 #    calc_A calc_B calc_C  id_A id_B id_C
 #1      1      1      5   x,z    d  g,f
 #2      1      2      8 x,y,z  d,e    f
 #3      6      7      1   y,z  d,e    g

 id1 <- grep("^id", colnames(df2))
 calc1 <- grep("^calc", colnames(df2)) 

 indx1 <-sapply(df2[, id1], str_count, ",")+1
 indx1[df2[, calc1] >2] <- NA
 df1$count <- rowSums(indx1, na.rm=TRUE)
 df1
 #     calc_C calc_B id_B id_C calc_A  id_A count
 #1      5      1    d  g,f      1   x,z     3
 #2      8      2  d,e    f      1 x,y,z     5
 #3      1      7  d,e    g      6   y,z     1

data 数据

df <- structure(list(id_A = c("x,z", "x,y,z", "y,z"), id_B = c("d", 
 "d,e", "d,e"), id_C = c("g,f", "f", "g"), calc_A = c(1L, 1L, 
 6L), calc_B = c(1L, 2L, 7L), calc_C = c(5L, 8L, 1L)), .Names = c("id_A", 
"id_B", "id_C", "calc_A", "calc_B", "calc_C"), class = "data.frame", row.names = c("1", 
"2", "3"))

I don't know if this is less messy than jbaums solution but here is another option : 我不知道这是不是比jbaums解决方案更乱,但这是另一种选择:

mydf<-data.frame(id_A=c("x,y","x,y,z","y,z"),id_B=c("d","d,e","d,e"),id_C=c("g,f","f","g"),
                 calc_A=c(1,1,6),calc_B=c(1,2,7),calc_C=c(5,8,1),stringsAsFactors=F)



mydf$count<-apply(mydf,1,function(rg,namesrg){
                     rg_calc<-rg[grep("calc",namesrg)]
                     rg_ids<-rg[grep("id",namesrg)]
                     idsinf2<-which(as.numeric( rg_calc)<=2)
                     ttids<-unlist(sapply(rg_ids[gsub("calc","id",names(rg_calc[idsinf2]))],function(id){strsplit(id,",")[[1]]}))
                     return(length(ttids))
                    },colnames(mydf))


>  mydf
   id_A id_B id_C calc_A calc_B calc_C count
1   x,y    d  g,f      1      1      5     3
2 x,y,z  d,e    f      1      2      8     5
3   y,z  d,e    g      6      7      1     1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据r中的ID从一列中查找另一列中的值 - Find values from one column in another column according to ID in r r 根据另一列计算一列中的值 - r count values in one column based on another column 根据 R 中另一列的值为列分配随机值 - Assign random values to column according to another column's values in R 一个列上的 Cumsum 取决于另一列中值的出现次数 - Cumsum on one column conditional on number of occurence of values from another column 根据 R 中另一列的值计算一列的值 - Count the values of a column based on the values of another column in R 在R中:创建一个新列,该列计算一个值在一个列中出现的次数,但从另一列中排除NA值 - in R: make a new column that counts the number of times a value appears in one column but excludes NA values from another column R 中,如何向数据集添加一列,该数据集从一列中添加值并从另一列中减去值? - How to add a column to a dataset which adds values from one column and subtracts values from another column in R? 根据另一列中的值对一列中的特定级别进行排序 - Sort out specific levels in one column according to the values in another column 根据另一列中的行信息替换一列中的不同值 - Replace different values in one column, according to the row information in another column 计算一列中的唯一值,计算另一列中的特定值, - count unique values in one column for specific values in another column,
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM