R mutate() 與 rowSums()

Question

我想獲取參與者 ID 和他們說的語言的 dataframe，然后創建一個新列來匯總每個參與者說的所有語言。 列是 ID，每種語言都有 0 =“不會說”和 1 =“會說”，包括“其他”列，然后是一個單獨的列，指定其他語言是什么，“Other.Lang”。 我想只對具有二進制值的列進行子集化，並使用每個參與者的總和創建這個新列。

首先是我的 dataframe。


      Participant.Private.ID French Spanish Dutch Czech Russian Hebrew Chinese German Italian Japanese Korean Portuguese Other Other.Lang
    1                5133249      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    2                5136082      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    3                5140442      0       1     0     0       0      0       0      0       0        0      0          0     0          0
    4                5141991      0       1     0     0       0      0       0      0       1        0      0          0     0          0
    5                5143476      0       0     0     0       0      0       0      0       0        0      0          0     0          0
    6                5145250      0       0     0     0       0      0       0      0       0        0      0          0     1      Malay
    7                5146081      0       0     0     0       0      0       0      0       0        0      0          0     0          0

這是結構：


    str(part_langs)
    
    grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame)
     $ Participant.Private.ID: num [1:7] 5133249 5136082 5140442 5141991 5143476 ...
     $ French                : num [1:7] 0 0 0 0 0 0 0
     $ Spanish               : num [1:7] 0 0 1 1 0 0 0
     $ Dutch                 : num [1:7] 0 0 0 0 0 0 0
     $ Czech                 : num [1:7] 0 0 0 0 0 0 0
     $ Russian               : num [1:7] 0 0 0 0 0 0 0
     $ Hebrew                : num [1:7] 0 0 0 0 0 0 0
     $ Chinese               : num [1:7] 0 0 0 0 0 0 0
     $ German                : num [1:7] 0 0 0 0 0 0 0
     $ Italian               : num [1:7] 0 0 0 1 0 0 0
     $ Japanese              : num [1:7] 0 0 0 0 0 0 0
     $ Korean                : num [1:7] 0 0 0 0 0 0 0
     $ Portuguese            : num [1:7] 0 0 0 0 0 0 0
     $ Other                 : num [1:7] 0 0 0 0 0 1 0
     $ Other.Lang            : chr [1:7] "0" "0" "0" "0" ...
     - attr(*, "groups")= tibble [7 x 2] (S3: tbl_df/tbl/data.frame)
      ..$ Participant.Private.ID: num [1:7] 5133249 5136082 5140442 5141991 5143476 ...

我認為這應該有效：


    num <- part_langs %>%
      mutate(num.langs = rowSums(part_langs[2:14]))
    num

但是，我不斷收到此錯誤消息：


    Error: Problem with `mutate()` input `num.langs`.
    x Input `num.langs` can't be recycled to size 1.
    i Input `num.langs` is `rowSums(part_langs[2:14])`.
    i Input `num.langs` must be size 1, not 7.
    i The error occurred in group 1: Participant.Private.ID = 5133249.

真正奇怪的是，當我嘗試創建此問題的簡化版本以創建可重現的示例時，它工作正常。

首先我創建一個數據集。


    test <- matrix(c(1, 1, 1, 0, 0, "",
                   2, 1, 0, 1, 0, "",
                   3, 0, 0, 0, 1, "Chinese"), ncol = 6, byrow=TRUE)
    
    test<-as.data.frame(test)
    
    colnames(test) <- c("ID", "English", "French", "Italian", "Other", "Other.Lang")
    
    str(test)

將二進制列轉換為數字：


    test$ID <- as.numeric(test$ID)
    test$English <- as.numeric(test$English)
    test$French <- as.numeric(test$French)
    test$Italian <- as.numeric(test$Italian)
    test$Other <- as.numeric(test$Other)

這是與上面相同的代碼，但使用了這個簡化的數據集。


    num <- test %>%
      mutate(num.langs = rowSums(test[2:5]))
    num

這是 output。 它完全按照我的意願工作：


    "ID","English","French","Italian","Other","Other.Lang","num.langs"
     1,     1,        1,       0,        0,        "",         2
     2,     1,        0,       1,        0,        "",         2
     3,     0,        0,       0,        1,     "Chinese",     1

所以我知道我在真實數據的某個地方搞砸了，但我不明白在哪里。 有人可以建議嗎？

Answer 1

另一種更依賴dplyr的方法是使用rowwise和c_across ：

test %>%
  rowwise() %>%
  mutate(num.lang = sum(c_across(English:Other)))

Answer 2

結果的差異可能是由於part_langs是一個分組的 dataframe，從您的帖子中顯示的str的 output 可以看出：

grouped_df [7 x 15] (S3: grouped_df/tbl_df/tbl/data.frame).

如果這是原因，請先ungroup並重新運行您的代碼：

library(dplyr)
part_langs <- part_langs %>% ungroup

R mutate() 與 rowSums()

問題描述

2 個解決方案

解決方案1
0 2021-11-25 16:29:14

解決方案2
0 已采納 2021-11-25 16:35:09

R mutate() 與 rowSums()

問題描述

2 個解決方案

解決方案1 0 2021-11-25 16:29:14

解決方案2 0 已采納 2021-11-25 16:35:09

解決方案1
0 2021-11-25 16:29:14

解決方案2
0 已采納 2021-11-25 16:35:09