R 两个因素的字符串计数

Question

I need some help.我需要帮助。 I have the following table:我有下表：

country_code=c(1,1,1,1,1,1,2,2,2,2,2,2)
target=c('V1','V1','V2','V2','V3','V3','V1','V1','V2','V2','V3','V3')
M1=c('X7','X7','X14','X14','X8','X8','X29','X22','X2','X22','X22','X22')
M2=c('X1','X1','X17','X11','X21','X21','X1','X29','X8','X18','X24','X24')
M3=c('NA','NA','NA','X1','NA','NA','NA','NA','NA','NA','NA','NA')
CountofRun=c(1,2,1,2,1,2,1,2,1,2,1,2)
df<-data.frame(country_code,target,M1,M2,M3,CountofRun)

and I would like to get a frequency table for each country_code and target combination.我想为每个country_code和目标组合获取一个频率表。 So for instance if X7 appears in all three runs for country_code=1 and target=V1 , X7 needs to be summed to 3. As you will see, I am only interested in counting the number of times each of the X1 to X30 appears in those 3 runs for each of 6 combinations of country_code and target.因此，例如，如果X7出现在country_code=1和target=V1 的所有三个运行中，则 X7 需要求和为 3。如您所见，我只对计算 X1 到 X30 中每个出现的次数感兴趣这 3 次针对 country_code 和目标的 6 种组合中的每一种运行。 I cannot convert to numeric.我无法转换为数字。

The ultimate table, hopefully will look like this最终表，希望看起来像这样

Answer 1

Maybe也许

library(dplyr)
library(tidyr)

df %>%
  select(-CountofRun) %>%
  gather(key, value, -(country_code:target)) %>%
  select(-key) %>%
  ftable(xtabs(~ country_code + target + value, data = .))

Which gives:这使：

#                    value NA X1 X11 X14 X17 X18 X2 X21 X22 X24 X29 X7 X8
#country_code target                                                     
#1            V1            2  2   0   0   0   0  0   0   0   0   0  2  0
#             V2            1  1   1   2   1   0  0   0   0   0   0  0  0
#             V3            2  0   0   0   0   0  0   2   0   0   0  0  2
#2            V1            2  1   0   0   0   0  0   0   1   0   2  0  0
#             V2            2  0   0   0   0   1  1   0   1   0   0  0  1
#             V3            2  0   0   0   0   0  0   0   2   2   0  0  0

Answer 2

A data.table solution (similar structure to the dplyr + tidyr just with different syntax)一个 data.table 解决方案（类似于 dplyr + tidyr 的结构，只是语法不同）

setDT(df)
df[, .SD
   ][, CountofRun := NULL
   ][, melt(.SD, id.vars=c('country_code', 'target'))
   ][, .N, .(country_code, target, value)
   ][, dcast(.SD, country_code + target ~ value, value.var='N', fill=0)
   ]

Answer 3

This will get you part way there;这会让你分道扬镳； you have the counts now it is just formatting:你现在有了计数，它只是格式化：

> library(data.table)
> 
> country_code=c(1,1,1,1,1,1,2,2,2,2,2,2)
> target=c('V1','V1','V2','V2','V3','V3','V1','V1','V2','V2','V3','V3')
> M1=c('X7','X7','X14','X14','X8','X8','X29','X22','X2','X22','X22','X22')
> M2=c('X1','X1','X17','X11','X21','X21','X1','X29','X8','X18','X24','X24')
> M3=c('NA','NA','NA','X1','NA','NA','NA','NA','NA','NA','NA','NA')
> CountofRun=c(1,2,1,2,1,2,1,2,1,2,1,2)
> df<-data.table(country_code,target,M1,M2,M3,CountofRun)
> 
> # melt the data for easier processing
> df_m <- melt(df, id.vars = c('country_code', 'target', 'CountofRun'))
> 
> # count
> df_count <- df_m[, 
+             .(count = sum(CountofRun)),
+             keyby = .(country_code, target, value)
+             ][value != "NA"]  # remove 'NA's
>             
> df_count
    country_code target value count
 1:            1     V1    X1     3
 2:            1     V1    X7     3
 3:            1     V2    X1     2
 4:            1     V2   X11     2
 5:            1     V2   X14     3
 6:            1     V2   X17     1
 7:            1     V3   X21     3
 8:            1     V3    X8     3
 9:            2     V1    X1     1
10:            2     V1   X22     2
11:            2     V1   X29     3
12:            2     V2   X18     2
13:            2     V2    X2     1
14:            2     V2   X22     2
15:            2     V2    X8     1
16:            2     V3   X22     3
17:            2     V3   X24     3
>

R 两个因素的字符串计数

问题描述

3 个解决方案

解决方案1
1 已采纳 2016-09-09 12:44:15

解决方案2
1 2016-09-09 17:16:30

解决方案3
-1 2016-09-09 12:48:20

R 两个因素的字符串计数

问题描述

3 个解决方案

解决方案1 1 已采纳 2016-09-09 12:44:15

解决方案2 1 2016-09-09 17:16:30

解决方案3 -1 2016-09-09 12:48:20

解决方案1
1 已采纳 2016-09-09 12:44:15

解决方案2
1 2016-09-09 17:16:30

解决方案3
-1 2016-09-09 12:48:20