[英]Google Colab having problems with Drive folders containing lots of files
I have imported several folders from Drive onto Google Colab.我已将几个文件夹从云端硬盘导入到 Google Colab。 The smaller folders work fine when listing directories, but when I try to list the directories in the larger folders, Colab gives me an error.较小的文件夹在列出目录时工作正常,但当我尝试列出较大文件夹中的目录时,Colab 给我一个错误。
I am aware that there are other ways of listing directories, but this same issue is causing problems further down the line when I try to access the files for training.我知道还有其他列出目录的方法,但是当我尝试访问文件进行培训时,同样的问题会导致进一步的问题。
I am using this to import the files:我正在使用它来导入文件:
from google.colab import drive
drive.mount('/content/drive')
And then describing the folders as follows:然后描述文件夹如下:
TRAIN = '../content/drive/My Drive/train/'
TEST = '../content/drive/My Drive/test/'
When I try to do the following:当我尝试执行以下操作时:
print(os.listdir(TEST))
print(os.listdir(TRAIN))
TEST prints fine.测试打印正常。 It has circa 8000 files (all images).它有大约 8000 个文件(所有图像)。
TRAIN prints some times, others it doesn't, It has circa 32.000 files (all images too): It prints this when I try to run it: TRAIN 有时打印,有时不打印,它有大约 32.000 个文件(也有所有图像): 当我尝试运行它时它会打印:
OSError: [Errno 5] Input/output error: '../content/drive/My Drive/train/'
Does anyone know how to fix this in Google colab?有谁知道如何在 Google colab 中解决这个问题?
I've found that if after importing the files I wait for a while and then run the prints, it runs, suggesting that Colab takes a while to process the files from Drive even after the cell importing stops running.我发现如果在导入文件后我等待一段时间然后运行打印,它会运行,这表明 Colab 需要一段时间来处理来自 Drive 的文件,即使在单元格导入停止运行之后也是如此。
Drive FUSE operations can time out when the number of files in a directory becomes large. 当目录中的文件数量变多时,驱动器FUSE操作可能会超时。
I/O operations for Drive directories are proportional to the number of files in the directory. 云端硬盘目录的I / O操作与目录中文件的数量成正比。 Since there's a fixed timeout in the FUSE client, when the number of files becomes large enough, operations in the directory will fail. 由于FUSE客户端中存在固定的超时,因此当文件数量足够大时,目录中的操作将失败。
A work-around is to organize your files into subdirectories so that the number of files or folders in a single directory doesn't become so large. 一种解决方法是将文件组织到子目录中,以使单个目录中的文件或文件夹的数量不会变得太大。
OSError: [Errno 5] Input/output error: '../content/drive/My Drive/train/'
原因是google colab无法将文件树读取为'../content/drive/My Drive/train/'
因此,请将其更改为'content/drive/My Drive/train/'
。(或根据您的完整路径当前工作目录)
As for me, it is the relative path that leads to the error.至于我,这是导致错误的相对路径。 I change it to full path and resolved it.我将其更改为完整路径并解决了它。 ie, change即,改变
../drive/MyDrive/
to到
/content/drive/MyDrive
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.