使用向量索引 R 中的 data.frame

Question

我有一個 data.frame，其中包含一個 ID 號和來自調查的縮放響應：

df(responses)

ID    X1    X2    X3    X4
A1    1     1     2     1
B2    0     1     3     0
C3    3     3     2     0

我還有一個 data.frame 用作鍵：

df(key)

X    Y    Z
2    1    1
3    2    2
4    3    4

我正在嘗試編寫一個腳本來計算每個參與者的X 、 Y和Z分數，其中X分數是鍵中X下列出的問題的答案的總和。

例如，參與者A1的X分數將等於A1行中X2 、 X3和X4的總和(1+2+1 = 4) 。

所需的 output 是：

df(output)

ID    X    Y    Z
A1    4    4    3
B2    4    4    1
C3    5    8    6

但是，我目前正在努力使用key中的值來索引 data.frame responses 。 我目前的 state 是：

#store scale names
scales <- c(colnames(key))
#loop over every participant
for (i in responses$ID){
    #create temporary data.frame with only participant "i"s responses
    data <- subset(responses, ID == i)
    #loop over each scale and store the relevant response numbers
    for (s in scales){
        relevantResponses <- scales[c(s)]
        #create a temporary storage for the total of each scale
        runningScore <- 0
        #index each response and add it to the total
        for (r in relevantResponses){
             runningScore <- runningScore + data[1,r]

但是我收到錯誤：

Error in `[.data.frame`(data, 1, r) : 
  undefined columns selected

有沒有更好的方法來進行索引而不是嵌套循環？

Answer 1

我們可以通過 lapply 對key數據列使用rowSums循環，根據lapply提取“responses”數字列，獲取rowSums將list轉換為data.frame並與“responses”的第一列cbind

cbind(responses[1], data.frame(lapply(key, 
     function(x) rowSums(responses[-1][, na.omit(x)], na.rm = TRUE))))

-輸出

#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6

或者使用tidyverse

imap(key, ~ responses %>%
     transmute(ID, !!.y :=  rowSums(select(cur_data()[-1], na.omit(.x)),
          na.rm = TRUE))) %>% 
     reduce(inner_join)

-輸出

#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6

或者另一種選擇是across mutate

key %>%
   mutate(across(everything(), 
       ~ rowSums(responses[-1][na.omit(.)], na.rm = TRUE)), 
          ID = responses$ID, .before = 1)
#  ID X Y Z
#1 A1 4 4 3
#2 B2 4 4 1
#3 C3 5 8 6

數據

responses <- structure(list(ID = c("A1", "B2", "C3"), X1 = c(1L, 0L, 3L), 
    X2 = c(1L, 1L, 3L), X3 = c(2L, 3L, 2L), X4 = c(1L, 0L, 0L
    )), class = "data.frame", row.names = c(NA, -3L))

key <- structure(list(X = 2:4, Y = 1:3, Z = c(1L, 2L, 4L)), class = "data.frame",
   row.names = c(NA, 
-3L))

Answer 2

這是處理此問題的另一種方法。 我只是想用我最喜歡的解決方案來挑戰自己，這並不像親愛的@akrun 提出的那樣簡潔和出色。 這是他教我如何使用purrr函數家族的人：

library(dplyr)
library(purrr)

responses %>% 
  select(X1:X4) %>% 
  pmap_dfr(., ~ map_dfc(1:length(key), function(x) sum(c(...)[key[, x]]))) %>%
  bind_cols(responses$ID) %>%
  set_names(c("x", "y", "z", "ID")) %>% 
  relocate(ID)

  ID        x     y     z
  <chr> <int> <int> <int>
1 A1        4     4     3
2 B2        4     4     1
3 C3        5     8     6

親愛的@akrun 提出了另外兩種簡潔的方法，我想在這里補充一下。 一個與rowSums function 和另一個從purrr package reduce 。 請記住，當我們在數據幀上應用+ function 和 reduce 時，它將應用於每一行並將其折疊成一個元素：

map_dfc(key, ~ responses[-1][.x] %>% rowSums())

# A tibble: 3 x 3
      X     Y     Z
  <dbl> <dbl> <dbl>
1     4     4     3
2     4     4     1
3     5     8     6

並reduce ：

map_dfc(key, ~ responses[-1][.x] %>% reduce(`+`))

# A tibble: 3 x 3
      X     Y     Z
  <int> <int> <int>
1     4     4     3
2     4     4     1
3     5     8     6

使用向量索引 R 中的 data.frame

問題描述

2 個解決方案

解決方案1
3 已采納 2021-05-29 19:42:20

數據

解決方案2
1 2021-05-29 22:33:45

使用向量索引 R 中的 data.frame

問題描述

2 個解決方案

解決方案1 3 已采納 2021-05-29 19:42:20

數據

解決方案2 1 2021-05-29 22:33:45

解決方案1
3 已采納 2021-05-29 19:42:20

解決方案2
1 2021-05-29 22:33:45