简体   繁体   English

如何在Python中提高磁盘读取速度

[英]How to increase read from disk speed in Python

I use Python for Image analysis. 我使用Python进行图像分析。 The first step in my code is to load the images from disk to a big 20GB uint8 array. 我的代码的第一步是将图像从磁盘加载到20GB的大uint8阵列。 This step is taking a very long time, loading about 10MB/s, and the cpu is idling during the task. 此步骤耗时很长,加载速度约为10MB / s,任务期间CPU处于空闲状态。

This seems extremely slow. 这似乎非常慢。 Am I making an obvious mistake? 我犯了一个明显的错误吗? How can I improve performance? 如何改善效能? Is it a problem with the numpy array type? numpy数组类型有问题吗?

# find all image files in working folder
FileNames = []      # FileNames is a list of image names
workingFolder = 'C:/folder'
for (dirpath, dirnames, filenames) in os.walk(workingFolder):
     FileNames.extend(filenames)
FileNames.sort() # Sorted by image number 
 imNumber = len(FileNames) # Number of Images

 # AllImages initialize
 img = Image.open(workingFolder+'/'+FileNames[0])
 AllImages = np.zeros((img.size[0],img.size[1], imNumber),dtype=np.uint8)

 for ii in range(imNumber):
     img = Image.open(workingFolder+'/'+FileNames[ii])
     AllImages[:,:,ii] = img

Thanks a lot for your help. 非常感谢你的帮助。

Since the CPU is idling it sounds that it's the disk that is the bottle neck. 由于CPU处于空闲状态,因此听起来是磁盘是瓶颈。 10 Mb/s is somewhat slow, but not that slow that it reminds me of stone age hard disks. 10 Mb / s的速度有些慢,但并不慢,它让我想起了石器时代的硬盘。 If it were numpy I'd expect the CPU to be busy running numpy code rather than being idle. 如果是numpy我希望CPU忙于运行numpy代码,而不是闲置。

Note that there maybe two ways the CPU will be waiting for the disk. 请注意,CPU可能有两种方式等待磁盘。 First of course you will need to read the data from disk, but also since the data is 20GB the data may be big enough to require it to be swapped to disk. 当然,首先您需要从磁盘读取数据,而且由于数据为20GB,因此数据可能足够大以至于需要将其交换到磁盘上。 The normal solution to this type of situation is to memory map the file (which will avoid moving data from disk to swap). 对于这种情况,通常的解决方案是对文件进行内存映射(这将避免将数据从磁盘移动到交换区)。

Try to check if you can read the files faster by other means. 尝试检查您是否可以通过其他方式更快地读取文件。 For example on linux you could use dd if=/path/to/image of=/tmp/output bs=8k count=10k; rm -f /tmp/output 例如在Linux上,您可以使用dd if=/path/to/image of=/tmp/output bs=8k count=10k; rm -f /tmp/output dd if=/path/to/image of=/tmp/output bs=8k count=10k; rm -f /tmp/output to check the speed of read to ram. dd if=/path/to/image of=/tmp/output bs=8k count=10k; rm -f /tmp/output检查对ram的读取速度。 See this question for more information on checking disk performance. 有关检查磁盘性能的更多信息,请参见此问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM