How to take first 25 words of each corpus (in R)?

Question

I'm guessing that the technique for this is similar to taking the first N characters from any dataframe, regardless of if it is a corpus or not.

My attempt:

create.greetings <- function(corpus, create_df = FALSE) {
  for(i in length(Charlotte.corpus.raw)) {
    Doc1<-Charlotte.corpus.raw[i]
    Word1<-Doc1[1:25]
    Greetings[i]<-Word1
  }
  return(VCorpus)
}

Where Greetings begins as a corpus with n=6. I couldn't figure out how to make a null corpus, or a corpus of large enough characters. I have a corpus of 200 documents here ( Charlotte.corpus.raw ). Unlike vectors (and by extension, dataframes), there doesn't seem to be a easy way to create null corpora.

Part of the problem is that R doesn't seem to recognize the class of "document". It only recognizes corpus. That is, that to R, a single document is a corpus of n=1.

Reproducable Sample: You will need the 'tm' and 'dplyr' and 'NLP' packages as well as more common R packages

read.corpus <- function(directory, pattern = "", to.lower = TRUE) {
 corpus <- DirSource(directory = directory, pattern = pattern) %>%
   VCorpus # Read files and create `VCorpus` object
 if(to.lower == TRUE) corpus <- # Lowercase text
     tm_map(corpus, 
            content_transformer(tolower))
 return(corpus)
}

Then run the function for any directory you have with a few txt documents, then you have a corpus to work with. Then replace Charlotte.corpus.raw from above with whatever you name your corpus as.

Answer 1

Each row of greetings will contain the first 25 words of each document:

greetings <- c()
for(i in 1:length(corpus)) {
  row <- unlist(corpus[i])[1:25]
  greetings <- rbind(greetings, row)
}

How to take first 25 words of each corpus (in R)?

Question

1 answers

solution1
0 2016-08-18 22:12:17

How to take first 25 words of each corpus (in R)?

Question

1 answers

solution1 0 2016-08-18 22:12:17

solution1
0 2016-08-18 22:12:17