如何获取每个语料库的前25个字（在R中）？

Question

I'm guessing that the technique for this is similar to taking the first N characters from any dataframe, regardless of if it is a corpus or not. 我猜想，这项技术类似于从任何数据帧中获取前N个字符，而不管它是否是一个主体。

My attempt: 我的尝试：

create.greetings <- function(corpus, create_df = FALSE) {
  for(i in length(Charlotte.corpus.raw)) {
    Doc1<-Charlotte.corpus.raw[i]
    Word1<-Doc1[1:25]
    Greetings[i]<-Word1
  }
  return(VCorpus)
}

Where Greetings begins as a corpus with n=6. Greetings以n = 6的语料库开始。 I couldn't figure out how to make a null corpus, or a corpus of large enough characters. 我不知道如何制作一个空的语料库或足够大字符的语料库。 I have a corpus of 200 documents here ( Charlotte.corpus.raw ). 我这里有200个文档的语料库（ Charlotte.corpus.raw ）。 Unlike vectors (and by extension, dataframes), there doesn't seem to be a easy way to create null corpora. 与向量（以及扩展而言，数据帧）不同，似乎没有一种简单的方法来创建空语料库。

Part of the problem is that R doesn't seem to recognize the class of "document". 问题的部分原因是R似乎无法识别“文档”类。 It only recognizes corpus. 它仅识别语料库。 That is, that to R, a single document is a corpus of n=1. 也就是说，对于R，单个文档是n = 1的语料库。

Reproducable Sample: You will need the 'tm' and 'dplyr' and 'NLP' packages as well as more common R packages 可重现的样本：您将需要'tm'和'dplyr'和'NLP'软件包以及更常见的R软件包

read.corpus <- function(directory, pattern = "", to.lower = TRUE) {
 corpus <- DirSource(directory = directory, pattern = pattern) %>%
   VCorpus # Read files and create `VCorpus` object
 if(to.lower == TRUE) corpus <- # Lowercase text
     tm_map(corpus, 
            content_transformer(tolower))
 return(corpus)
}

Then run the function for any directory you have with a few txt documents, then you have a corpus to work with. 然后对包含几个txt文档的任何目录运行该函数，然后可以使用一个语料库。 Then replace Charlotte.corpus.raw from above with whatever you name your corpus as. 然后从上方用您命名的语料库替换Charlotte.corpus.raw。

Answer 1

Each row of greetings will contain the first 25 words of each document: 每行问候语将包含每个文档的前25个字：

greetings <- c()
for(i in 1:length(corpus)) {
  row <- unlist(corpus[i])[1:25]
  greetings <- rbind(greetings, row)
}

如何获取每个语料库的前25个字（在R中）？

问题描述

1 个解决方案

解决方案1
0 2016-08-18 22:12:17

如何获取每个语料库的前25个字（在R中）？

问题描述

1 个解决方案

解决方案1 0 2016-08-18 22:12:17

解决方案1
0 2016-08-18 22:12:17