r中的字频计数器

Question

I would like to perform a certain operation which will transform the data in the provided format: 我想执行某些操作，它将以提供的格式转换数据：

Input : 输入：

Col_A                         Col_B
textA textB                     10
textB textC                     20
textC textD                     30
textD textE                     40
textE textF                     20

Operation: 操作方式：

ColA           ColB(Frequency)            ColC
textA                  1                    10
textB                  2                  10+20
textC                  2                  20+30
textD                  2                  30+40
textE                  2                  40+20
textF                  1                    20

Output: 输出：

  ColA           ColB(Frequency)            ColC
    textA                  1                  10
    textB                  2                  30
    textC                  2                  50
    textD                  2                  70
    textE                  2                  60
    textF                  1                  20

Currently I am using 目前我正在使用

k <- (dfm(A2$Query, ngrams = 1, concatenator = " ", verbose = FALSE))
k <- colSums(k)
k <- as.data.frame(k)

And this has given me frequency column. 这给了我频率栏。 How to achieve colC ? 如何实现colC？

Answer 1

We could use cSplit() from the splitstackshape package in combination with dplyr . 我们可以使用cSplit()从splitstackshape组合包dplyr 。

library(splitstackshape)
library(dplyr)
cSplit(df, "Col_A", sep = " ", direction = "long") %>% 
  group_by(Col_A) %>%
  summarise(Freq = n(), ColC = sum(Col_B))
#   Col_A  Freq  ColC
#  (fctr) (int) (int)
#1  textA     1    10
#2  textB     2    30
#3  textC     2    50
#4  textD     2    70
#5  textE     2    60
#6  textF     1    20

Data 数据

df <- structure(list(Col_A = structure(1:5, .Label = c("textA textB", 
"textB textC", "textC textD", "textD textE", "textE textF"), class = "factor"), 
    Col_B = c(10L, 20L, 30L, 40L, 20L)), .Names = c("Col_A", 
"Col_B"), class = "data.frame", row.names = c(NA, -5L))

Answer 2

Here is another option with separate/gather 这是带有separate/gather另一个选项

library(dplyr)
library(tidyr)
separate(df1, Col_A, into = c("Col_A1", "Col_A2")) %>%
         gather(Var, ColA, -Col_B) %>%
         group_by(ColA) %>%
         summarise(Freq=n(),Col_C= sum(Col_B))
#   ColA  Freq Col_C
#  (chr) (int) (int)
#1 textA     1    10
#2 textB     2    30
#3 textC     2    50
#4 textD     2    70
#5 textE     2    60
#6 textF     1    20

Or with base R options by splitting the 'Col_A' by space, replicate the 'Col_B' by the lengths of the list output from 'lst' to create a data.frame and then use aggregate to get the length and sum of 'Col_B'. 或使用base R选项通过按空格分隔'Col_A'，用'lst'输出的list的lengths复制'Col_B'来创建data.frame ，然后使用aggregate获取'Col_B'的length和sum 。

lst <- strsplit(df1$Col_A, " ")
d1 <- data.frame(Col_A= unlist(lst), Col_C=rep(df1$Col_B, lengths(lst)))
do.call(data.frame, aggregate(.~Col_A, d1, function(x) c(length(x), sum(x))))

r中的字频计数器

问题描述

Input : 输入：

Operation: 操作方式：

Output: 输出：

2 个解决方案

解决方案1
4 已采纳 2016-04-08 08:05:34

解决方案2
1 2016-04-08 09:00:10

r中的字频计数器

问题描述

Input : 输入：

Operation: 操作方式：

Output: 输出：

2 个解决方案

解决方案1 4 已采纳 2016-04-08 08:05:34

解决方案2 1 2016-04-08 09:00:10

解决方案1
4 已采纳 2016-04-08 08:05:34

解决方案2
1 2016-04-08 09:00:10