简体   繁体   English

在readtext()中使用通配符

[英]Use of wildcards with readtext()

A basic question. 一个基本的问题。 I have a bunch of transcripts (.docx files) I want to read into a corpus. 我有一堆成绩单(.docx文件)要读入语料库。 I use readtext() to read in single files no problem. 我使用readtext()读取单个文件没有问题。

dat <- readtext("~/ownCloud/NLP/interview_1.docx")

As soon as I put "*.docx" in my readtext statement it spits an error. 一旦在我的readtext语句中放入“ * .docx”,它就会吐出一个错误。

dat <- readtext("~/ownCloud/NLP/*.docx")

Error: '/var/folders/bl/61g7ngh55vs79cfhfhnstd4c0000gn/T//RtmpWD6KSx/readtext-aa71916b691c0cf3cabc73a2e04a45f7/word/document.xml' does not exist.
In addition: Warning message:
In utils::unzip(file, exdir = path) : error 1 in extracting from zip file

Why the reference to a zip file? 为什么要引用一个zip文件? I have only .docx files in the directory. 我的目录中只有.docx个文件。

I was able to reproduce the same problem. 我能够重现同样的问题。 The issue was there are some hidden/temp .docx files in that folder, if you delete them and then try the code it works. 问题是该文件夹中有一些隐藏/临时的.docx文件,如果您删除它们然后尝试使用它的代码,则该文件。

To see the hidden files, go to the folder from where you are reading docx files and based on your OS select a way to show them. 要查看隐藏的文件,请转至您正在读取docx文件的文件夹,然后根据您的操作系统选择一种显示它们的方法。 On my mac I used 在我的Mac上,我使用了

CMD + SHIFT + .

Once you delete them, try the code again and it should work 删除它们后,请再次尝试代码,它应该可以工作

library(readtext)
dat <- readtext("~/ownCloud/NLP/*.docx")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM