簡體   English   中英

如何基於(按元素)選定的相鄰列計算重復項的按行計數

[英]How to calculate a row-wise count of duplicates based on (element-wise) selected adjacent columns

我有一個數據幀測試

group userID A_conf A_chall B_conf B_chall
1    220      1       1      1       2     
1    222      4       6      4       4     
2    223      6       5      3       2     
1    224      1       5      4       4    
2    228      4       4      4       4    

數據包含每個用戶的響應( 由userID表示 ),其中每個用戶都可以為兩種度量輸入介於1到6之間的任何值:

  • conf
  • 挑戰

他們還可以選擇不響應,從而導致輸入NA

測試數據幀包含幾列,例如A,B,C,D等。 Conf和Chall度量可以分別報告這些列中的每列。

我有興趣進行以下比較:

  • A_confA_chall
  • B_confB_chall

如果這些度量中的任何一個相等,則Final計數器應增加(如下所示)。

group userID A_conf A_chall B_conf B_chall Final
1    220      1       1      1       2     1
1    222      4       6      4       4     1
2    223      6       5      3       2     0
1    224      1       5      4       4     1
2    228      4       4      4       4     2

我在決賽櫃台上苦苦掙扎。 什么腳本可以幫助我實現此功能?

作為參考,下面共享測試數據幀集的輸出:

  • dput(測試):

    結構(列表(組= c(1L,1L,2L,1L,2L),

    用戶ID = c(220L,222L,223L,224L,228L),

    A_conf = c(1L,4L,6L,1L,4L),

    A_chall = c(1L,6L,5L,5L,4L),

    B_conf = c(1L,4L,3L,4L,4L),

    B_chall = c(2L,4L,2L,4L,4L)),

    class =“ data.frame”,row.names = c(NA,-5L))

我試過這樣的代碼:

test$Final = as.integer(0)   # add a column to keep counts
count_inc = as.integer(0)    # counter variable to increment in steps of 1

for (i in 1:nrow(test)) {

    count_inc = 0

    if(!is.na(test$A_conf[i] == test$A_chall[i]))
    {
      count_inc = 1
      test$Final[i] = count_inc
    }#if

    else if(!is.na(test$A_conf[i] != test$A_chall[i]))
    {
      count_inc = 0
      test$Final[i] = count_inc
    }#else if
}#for

上面的代碼僅在A_confA_chall列上有效 問題是,無論(用戶)輸入的值是否相等,都用“ 1”填充“ 最終”列。

假設您具有相等數量的“ conf”和“ chall”列的基本R解決方案

#Find indexes of "conf" column
conf_col <- grep("conf", names(test))

#Find indexes of "chall" column
chall_col <- grep("chall", names(test))

#compare element wise and take row wise sum
test$Final <- rowSums(test[conf_col] == test[chall_col])


test
#  group userID A_conf A_chall B_conf B_chall Final
#1     1    220      1       1      1       2     1
#2     1    222      4       6      4       4     1
#3     2    223      6       5      3       2     0
#4     1    224      1       5      4       4     1
#5     2    228      4       4      4       4     2

也可以單線完成

rowSums(test[grep("conf", names(test))] == test[grep("chall", names(test))])

使用tidyverse您可以執行以下操作:

df %>%
 select(-Final) %>%
 rowid_to_column() %>% #Creating an unique row ID
 gather(var, val, -c(group, userID, rowid)) %>% #Reshaping the data
 arrange(rowid, var) %>% #Arranging by row ID and by variables
 group_by(rowid) %>% #Grouping by row ID
 mutate(temp = gl(n()/2, 2)) %>% #Creating a grouping variable for different "_chall" and "_conf" variables
 group_by(rowid, temp) %>% #Grouping by row ID and the new grouping variables
 mutate(res = ifelse(val == lag(val), 1, 0)) %>% #Comparing whether the different "_chall" and "_conf" have the same value
 group_by(rowid) %>% #Grouping by row ID
 mutate(res = sum(res, na.rm = TRUE)) %>% #Summing the occurrences of "_chall" and "_conf" being the same
 select(-temp) %>% 
 spread(var, val) %>% #Returning the data to its original form
 ungroup() %>%
 select(-rowid)

  group userID   res A_chall A_conf B_chall B_conf
  <int>  <int> <dbl>   <int>  <int>   <int>  <int>
1     1    220    1.       1      1       2      1
2     1    222    1.       6      4       4      4
3     2    223    0.       5      6       2      3
4     1    224    1.       5      1       4      4
5     2    228    2.       4      4       4      4

您也可以嘗試這個tidyverse。 與其他答案相比,行數更少;)

library(tidyverse)
d %>% 
  as.tibble() %>% 
  gather(k, v, -group,-userID) %>% 
  separate(k, into = c("letters", "test")) %>% 
  spread(test, v) %>% 
  group_by(userID) %>% 
  mutate(final = sum(chall == conf)) %>% 
  distinct(userID, final) %>% 
  ungroup() %>% 
  right_join(d)
# A tibble: 5 x 7
  userID final group A_conf A_chall B_conf B_chall
   <int> <int> <int>  <int>   <int>  <int>   <int>
1    220     1     1      1       1      1       2
2    222     1     1      4       6      4       4
3    223     0     2      6       5      3       2
4    224     1     1      1       5      4       4
5    228     2     2      4       4      4       4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM