從R中的目錄讀取語料庫的編號順序的文本文件

Question

docs <- Corpus(DirSource(cname))

我有一個cname目錄，其中包含文本文件（1.txt，2.txt，.... 10.txt，11.txt，..），我想按編號順序創建語料庫（如1， 2,3，...，10,11 ..）但是語料庫在字典順序中讀取為1,10,11，... 19,2所以如何確保語料庫讀取目錄中的文件在訂購我要求。

謝謝，

Answer 1

這是一個值得嘗試的東西。

# simulate your file structure - you have this already
txt <- c("This is some text.", "This is some more text.","This is additional text.","Yet more additional text.")
num <- c(1,2,10,20)
td  <- tempdir()     # temporary directory
# creates 4 files in temp dir: 1.txt, 2.txt, 10.txt, and 20.txt
mapply(function(x,y) writeLines(x,paste0(td,"/",y,".txt")),txt,num)

# you start here...
library(tm)
src <- DirSource(directory=td, pattern=".txt")
names(Corpus(src))
# [1] "1.txt"  "10.txt" "2.txt"  "20.txt"
src$filelist <- src$filelist[order(as.integer(gsub("^.*/([0-9]+)\\.txt$","\\1",src$filelist)))]
names(Corpus(src))
# [1] "1.txt"  "2.txt"  "10.txt" "20.txt"

# clean up: just for this example
unlink(paste(td,"*.*",sep="/"))   # remove sample files...

所以DirSource(...)返回類DirSource的對象，它有一個元素$filelist 。 這是文件名的向量（按您不想要的順序）。 上面的代碼（應）提取物之前的文件數".txt" ，將其轉換成整數，以便filesource基於所述整數值。

從R中的目錄讀取語料庫的編號順序的文本文件

問題描述

1 個解決方案

解決方案1
2 已采納 2015-09-27 22:22:17

從R中的目錄讀取語料庫的編號順序的文本文件

問題描述

1 個解決方案

解決方案1 2 已采納 2015-09-27 22:22:17

解決方案1
2 已采納 2015-09-27 22:22:17