简体   繁体   English

Google Colab 在处理包含大量文件的云端硬盘文件夹时遇到问题

[英]Google Colab having problems with Drive folders containing lots of files

I have imported several folders from Drive onto Google Colab.我已将几个文件夹从云端硬盘导入到 Google Colab。 The smaller folders work fine when listing directories, but when I try to list the directories in the larger folders, Colab gives me an error.较小的文件夹在列出目录时工作正常,但当我尝试列出较大文件夹中的目录时,Colab 给我一个错误。

I am aware that there are other ways of listing directories, but this same issue is causing problems further down the line when I try to access the files for training.我知道还有其他列出目录的方法,但是当我尝试访问文件进行培训时,同样的问题会导致进一步的问题。

I am using this to import the files:我正在使用它来导入文件:

from google.colab import drive
drive.mount('/content/drive')

And then describing the folders as follows:然后描述文件夹如下:

TRAIN = '../content/drive/My Drive/train/'
TEST = '../content/drive/My Drive/test/'

When I try to do the following:当我尝试执行以下操作时:

print(os.listdir(TEST))
print(os.listdir(TRAIN))

TEST prints fine.测试打印正常。 It has circa 8000 files (all images).它有大约 8000 个文件(所有图像)。

TRAIN prints some times, others it doesn't, It has circa 32.000 files (all images too): It prints this when I try to run it: TRAIN 有时打印,有时不打印,它有大约 32.000 个文件(也有所有图像): 当我尝试运行它时它会打印:

OSError: [Errno 5] Input/output error: '../content/drive/My Drive/train/'

Does anyone know how to fix this in Google colab?有谁知道如何在 Google colab 中解决这个问题?

I've found that if after importing the files I wait for a while and then run the prints, it runs, suggesting that Colab takes a while to process the files from Drive even after the cell importing stops running.我发现如果在导入文件后我等待一段时间然后运行打印,它会运行,这表明 Colab 需要一段时间来处理来自 Drive 的文件,即使在单元格导入停止运行之后也是如此。

Drive FUSE operations can time out when the number of files in a directory becomes large. 当目录中的文件数量变多时,驱动器FUSE操作可能会超时。

I/O operations for Drive directories are proportional to the number of files in the directory. 云端硬盘目录的I / O操作与目录中文件的数量成正比。 Since there's a fixed timeout in the FUSE client, when the number of files becomes large enough, operations in the directory will fail. 由于FUSE客户端中存在固定的超时,因此当文件数量足够大时,目录中的操作将失败。

A work-around is to organize your files into subdirectories so that the number of files or folders in a single directory doesn't become so large. 一种解决方法是将文件组织到子目录中,以使单个目录中的文件或文件夹的数量不会变得太大。

OSError: [Errno 5] Input/output error: '../content/drive/My Drive/train/'

原因是google colab无法将文件树读取为'../content/drive/My Drive/train/'因此,请将其更改为'content/drive/My Drive/train/' 。(或根据您的完整路径当前工作目录)

As for me, it is the relative path that leads to the error.至于我,这是导致错误的相对路径。 I change it to full path and resolved it.我将其更改为完整路径并解决了它。 ie, change即,改变

../drive/MyDrive/  

to

/content/drive/MyDrive

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Google Colab 中的文件和文件夹 - Files and Folders in Google Colab 使用 Google Colab 删除 Google Drive 上的文件 - Remove files on Google Drive using Google Colab 如何将包含多个文件夹的文件夹从谷歌驱动器上传到谷歌 colab - How to upload folder with multiple folders from google drive to google colab 有没有办法访问本地文件,而无需在Google Colab中使用upload()选项或将数据上传到驱动器然后访问它 - Is there a way to access local files without having to use upload() option in Google Colab or uploading the data to the drive and then accessing it 如何在 Google Colab 中引用共享文件和文件夹? - How to reference shared files and folders in Google Colab? 谷歌 colab 错误使用谷歌驱动器中的文件 - Google colab error to use files from google drive 通过 google colab 将文件保存到 google drive 时出错 - Error saving files into google drive via google colab 将谷歌 Colab 在临时位置下载的文件转储到谷歌驱动器 - dump files downloaded by google Colab in temporary location to google drive 将文件从已安装的 Google Drive 复制到本地 Google Colab session - Copy files from a mounted Google Drive to a local Google Colab session 如何设置从 Google Colab 到 Google Drive 文件的路径? - How to set the path to Google Drive files from Google Colab?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM