简体   繁体   English

将多个.txt 文件导入 r

[英]Importing multiple .txt files into r

I need to import multiple.txt files into R.我需要将多个.txt 文件导入 R。 Each file has multiple sentences in it (for eg, "On Monday, I went to the park.") I would like to be able to import all the files in at the same time and then add them to a tibble, so that I can do text analysis on it.每个文件中都有多个句子(例如,“星期一,我去了公园。”)我希望能够同时导入所有文件,然后将它们添加到 tibble,这样我可以对其进行文本分析。

So far, I have tried到目前为止,我已经尝试过

#to create vector of txt files
files <- list.files(pattern = "txt$")

# Read all the files and create a FileName column to store filenames
files_list <- files %>%
  set_names(.) %>%
  map_df(read_table2, .id = "FileName")
my_data <- read.delim(file(files))

But I don't know how to actually load the text in each.txt file into the data.但我不知道如何将 each.txt 文件中的文本实际加载到数据中。 When I run this code above, it only reads in the text from one of the files, not all.当我在上面运行此代码时,它仅从其中一个文件中读取文本,而不是全部。

I also tried:我也试过:

sapply(files, read.delim)
mainlist = list()
for (i in 1: length(fileList)) {
  mainlist[[i]] = read.delim(files[i], header = TRUE, sep = "\t")
}

And while it prints out all the info in each.txt file, when I try to put it in a tibble using虽然它会打印出 each.txt 文件中的所有信息,但当我尝试使用

mainlist_tib <- tibble(mainlist)

the tibble is empty.小标题是空的。

Any assistance would be greatly appreciated!任何帮助将不胜感激!

Edit: Regarding the tibble, I would like for it to have a column for the txt file name and then another column for the text from the file, and then to be able to use the unnest_tokens() function to have a tibble where each row contains only one word.编辑:关于 tibble,我希望它有一列用于 txt 文件名,然后是另一列用于文件中的文本,然后能够使用unnest_tokens() function 在每一行都有一个 tibble只包含一个词。 Sort of like in the example from the text mining textbook by Silge and Robinson: https://www.tidytextmining.com/tidytext.html有点像 Silge 和 Robinson 的文本挖掘教科书中的示例: https://www.tidytextmining.com/tidytext.html

You could try it like this:你可以这样尝试:

library(dplyr)
library(purrr)

files %>%
  set_names(.) %>%
  map_dfr(~readr::read_table(., col_names = F), .id = "FileName")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM