給定另外兩個列的名稱，計算一列中的唯一實例

Question

我有下表（稱為火車）（實際上更大）

UNSPSC adaptor alert bact blood collection packet patient ultrasoft whit
 514415       1     0    1     0          0      0       0         1    0
 514415       0     0    0     1          1      0       0         1    0
 514415       0     0    1     0          0      0       0         1    0
 514415       0     0    0     0          0      0       0         1    0
 514415       1     0    1     0          0      0       0         1    0
 514415       0     0    0     0          0      0       0         1    0
 422018       1     0    1     0          0      0       0         1    0
 422018       0     0    0     0          0      0       0         1    0
 422018       0     0    0     1          0      0       0         1    0
 411011       0     0    0     0          0      0       0         1    0

下表稱為關聯：

 lhd     rhs
blood   collection
adaptor bact
[...]

我想計算lhs和rhs的關聯表中每條記錄的值等於1的每列的唯一UNSPSC數量。 喜歡：

采血1適配器支架2

這段代碼一次只執行一次。

apply(train[,-1], 2, function(x) length(unique(substr(train$UNSPSC,1,4)[x == 1])))

Answer 1

您可以遍歷associations並使用subset （ x行第1列和第2列等於1 subset集）， unique ， length函數，而不是遍歷trains 。
使用get函數調用第x行中的列。

train$lhd <- 1
train$rhs <- 1
apply(associations, 1, function(x)
    length(unique(subset(train, get(x[1]) == 1 & get(x[2]) == 1)$UNSPSC))
)
# [1] 3 1 2

數據（ train ）：

structure(list(UNSPSC = c(514415L, 514415L, 514415L, 514415L, 
514415L, 514415L, 422018L, 422018L, 422018L, 411011L), adaptor = c(1L, 
0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L), alert = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L), bact = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 
0L, 0L, 0L), blood = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L
), collection = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), packet = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), patient = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), ultrasoft = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), whit = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), lhd = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), rhs = c(1, 1, 
1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("UNSPSC", "adaptor", "alert", 
"bact", "blood", "collection", "packet", "patient", "ultrasoft", 
"whit", "lhd", "rhs"), row.names = c(NA, -10L), class = "data.frame")

數據（ associations ）：

structure(list(V1 = c("lhd", "blood", "adaptor"), V2 = c("rhs", 
"collection", "bact")), .Names = c("V1", "V2"), row.names = c(NA, 
-3L), class = "data.frame")

Answer 2

tidyverse的類似選項是（來自@PoGibas的數據）將pmap應用到“關聯”數據上以遍歷列， filter列均為1的“ train”， pull “ UNSCPSC”列並獲取unique元素的length （ n_distinct ）

library(tidyverse)
pmap_int(associations, ~ train %>% 
                           filter(!! rlang::sym(.x) == 1, !! rlang::sym(.y) == 1) %>% 
                           pull(UNSPSC) %>% 
                           n_distinct)
#[1] 3 1 2

給定另外兩個列的名稱，計算一列中的唯一實例

問題描述

2 個解決方案

解決方案1
3 已采納 2018-02-08 04:55:55

解決方案2
2 2018-02-08 05:50:21

給定另外兩個列的名稱，計算一列中的唯一實例

問題描述

2 個解決方案

解決方案1 3 已采納 2018-02-08 04:55:55

解決方案2 2 2018-02-08 05:50:21

解決方案1
3 已采納 2018-02-08 04:55:55

解決方案2
2 2018-02-08 05:50:21