# R中矩阵中的项频对列表Lists of term-frequency pairs into a matrix in R

``````aword:3 bword:2 cword:15 dword:2
bword:4 cword:20 fword:1
etc...
``````

``````docs <- scan("data.txt", what="", sep="\n")
doclist <- strsplit(docs, "[[:space:]]+")
``````

``````doclist2 <- strsplit(doclist, ":", fixed=TRUE)
``````

``````        doc1 doc2 doc3 doc4 ...
aword   3    0    0    0
bword   2    4    0    0
cword:  15   20   0    0
dword   2    0    0    0
fword:  0    1    0    0
...
``````

## 1 个回复1

### ===============>>#1 票数：0 已采纳

``````## Your sample data
x <- c("aword:3 bword:2 cword:15 dword:2", "bword:4 cword:20 fword:1")
## Split on a spaces and colons
B <- strsplit(x, "\\s+|:")
B <- setNames(B, paste0("document", seq_along(B)))
## Put everything together into a long matrix
out <- do.call(rbind, lapply(seq_along(B), function(x)
cbind(document = names(B)[x], matrix(B[[x]], ncol = 2, byrow = TRUE,
dimnames = list(NULL, c("word", "count"))))))

## Convert to a data.frame
out <- data.frame(out)
out
#    document  word count
# 1 document1 aword     3
# 2 document1 bword     2
# 3 document1 cword    15
# 4 document1 dword     2
# 5 document2 bword     4
# 6 document2 cword    20
# 7 document2 fword     1
## Make sure the counts column is a number
out\$count <- as.numeric(as.character(out\$count))

## Use xtabs to get the output you want
xtabs(count ~ word + document, out)
#        document
# word    document1 document2
#   aword         3         0
#   bword         2         4
#   cword        15        20
#   dword         2         0
#   fword         0         1
``````

1回复

1回复

2回复

1回复

1回复

2回复

2回复

2回复

2回复

3回复