简体   繁体   中英

R text mining package updating the corpus by modifying or deleting existing documents

I would like to modify an existing document indexed by a corpus by doing something simple like this

myCorpus[[10]] = "hey I am the new content of this document"

Is this valid?

It is not clear what do you want to do with your corpus. append your Corpus or modify the 10th element?

I want to say that as a syntax it is correct but as semantic is false.

Conceptually a corpus is a metadata and a list of TextDocument. So, You can access this list as any R list with '[[' or with '$'.

So if you do ( It is better to use <- than = even is here they are equivalent)

myCorpus[[10]] <- "hey I am the new content of this document" 

This will create or change the 10th element , but with an element of class character not a TextDocument . So you can't apply use methods on class

So To update the content of 10 text document:

Content(myCorpus[[10]]) <- "hey I am the new content of this document" 

To create new elements use :

tmUpdate(ovid, DirSource(txt))

The source is checked for new files which do not already exist in the document collection. are parsed and added to the existing document collection.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM