简体   繁体   English

如何在不重新启动计算机的情况下强制Python代码再次读取输入文件

[英]How can I force Python code to read input files again without rebooting my computer

I am scanning through a large number of files looking for some markers. 我正在扫描大量文件,寻找一些标记。 I am starting to be really confident that once I have run through the code one time Python is not rereading the actual files from disk. 我开始真正有信心,一旦我遍历代码一次,Python就不会从磁盘上重新读取实际文件。 I find this behavior strange because I was told that one reason I needed to structure my file access in the manner I have is so that the handle and file content is flushed. 我发现此行为很奇怪,因为有人告诉我,我需要以现有方式构造文件访问权限的一个原因是,要刷新句柄和文件内容。 But that can't be. 但是那不可能。

There are 9,568 file paths in the list I am reading from. 我正在读取的列表中有9,568个文件路径。 If I shut down Python and reboot my computer it takes roughly 6 minutes to read the files and determine if there is anything returned from the regular expression. 如果我关闭Python并重新启动计算机,则大约需要6分钟才能读取文件并确定正则表达式是否返回了任何内容。

However, if I run the code a second time it takes about 36 seconds . 但是,如果我第二次运行代码,则大约需要36秒 Just for grins, the average document has 53,000 words. 仅凭笑容,平均文档就有53,000个单词。

Therefore I am concluding that Python still has access to the file it read in the first iteration. 因此,我得出的结论是,Python仍然可以访问在第一次迭代中读取的文件。

I want to also observe that the first time I do this I can hear the disk spin (E:\\ - Python is on C:). 我还希望观察到,第一次执行此操作时,我可以听到磁盘旋转(E:\\-Python在C:上)。 E is just a spinning disk with 126 MB cache - I don't think the cache is big enough to hold the contents of these files. E只是具有126 MB高速缓存的旋转磁盘-我认为高速缓存不足以容纳这些文件的内容。 When I do it later I do not hear the disk spin. 稍后再执行时,我听不到磁盘旋转的声音。

Here is the code 这是代码

import re
test_7A_re = re.compile(r'\n\s*ITEM\s*7\(*a\)*[.]*\s*-*\s*QUANT.*\n',re.IGNORECASE)
no7a = []
for path in path_list:
    path = path.strip()
    with open(path,'r') as fh:
        string = fh.read()
    items = [item for item in re.finditer(test_7A_re,string)]
    if len(items) == 0:
        no7a.append(path)
        continue

I care about this for a number of reasons, one is that I was thinking about using multi-processing. 我之所以关心这个原因有很多,其中之一是我正在考虑使用多处理。 But if the bottleneck is reading in the files I don't see that I will gain much. 但是,如果瓶颈正在读取文件,我看不到我会收获很多。 I also think this is a problem because I would be worried about the file being modified and not having the most recent version of the file available. 我也认为这是一个问题,因为我会担心文件被修改并且没有可用的最新版本。

I am tagging this 2.7 because I have no idea if this behavior is persistent across versions. 我将其标记为2.7,因为我不知道此行为在各个版本之间是否持久。

To confirm this behavior I modified my code to run as a .py file, and added some timing code. 为确认此行为,我修改了代码以使其以.py文件形式运行,并添加了一些计时代码。 I then rebooted my computer - the first time it ran it took 5.6 minutes and the second time (without rebooting) the time was 36 seconds. 然后,我重新启动了计算机-第一次运行需要5.6分钟,第二次(不重新启动)时间是36秒。 Output is the same in both cases. 在两种情况下输出都是相同的。

The really interesting thing is that even if shut down IDLE (but do not reboot my computer) it still takes 36 seconds to run the code . 真正有趣的是,即使关闭IDLE(但不要重新启动计算机),仍然需要36秒钟来运行代码

All of this suggests to me that the files are not read from disk after the first time - this is amazing behavior to me but it seems dangerous. 所有这些都向我表明,第一次后不会从磁盘读取文件-这对我来说是一个惊人的行为,但似乎很危险。

To be clear, the results are the same - I believe given the timing tests I have run and the fact that I do not hear the disk spinning that somehow the files are still accessible to Python. 需要明确的是,结果是相同的-我相信,鉴于我已经运行了时序测试,而且我听不到磁盘旋转的事实,以某种方式仍可以访问Python文件。

This is caused by caching in Windows. 这是由Windows中的缓存引起的。 It is not related to Python. 它与Python不相关。

In order to stop Windows from caching your reads: 为了阻止Windows缓存您的读取:

  1. Disable paging file in Windows and fill the RAM up to 90% 在Windows中禁用分页文件,并将RAM最多填充90%

  2. Use some tool to disable file caching in Windows like this one . 使用某种工具在Windows中禁用文件缓存, 就像这样一种

  3. Run your code on a Linux VM on your Windows machine that has limited RAM. 在内存有限的Windows计算机上的Linux VM上运行代码。 In Linux you can control the caching much better 在Linux中,您可以更好地控制缓存

  4. Make the files much bigger, so that they won't fit in cache 使文件更大,以使它们不适合缓存

I fail to see why this is a problem. 我不明白为什么这是一个问题。 I'm not 100% certain of how Windows handles file cache invalidation, but unless the "Last modified time" changes, you and I and Windows would assume that the file still holds the same content. 我不确定Windows如何处理文件缓存无效,但是除非“上次修改时间”更改,否则您和我以及Windows都将假定该文件仍包含相同的内容。 If the file holds the same content, I don't see why reading from cache can be a problem. 如果文件包含相同的内容,我不明白为什么从缓存中读取可能会成为问题。

I'm pretty sure that if you change the last modified date, say, by opening the file for write access then closing it right away, Windows will hold sufficient doubts over the file content and invalidate the cache. 我非常确定,如果您更改了最后修改日期,例如,通过打开文件进行写访问然后立即将其关闭,Windows将对文件内容持有足够的怀疑,并使缓存无效。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何编辑我的 VS Code 环境,以便我可以为我的 Python 代码预设输入数据,这样我就不必一次又一次地输入数据 - How do I edit my VS Code environment so that I can preset Input data for my Python Code so that I don't have to input data again and again 如何让我的 python 代码再次运行 - How can I get my python code to run again 如何在不要求用户输入任何内容的情况下重复计算器代码? - Python 中的连续计算 - How can I make my calculator code repeat without asking the user to input anything? - continuous calculation in Python 如何在不使用计算机路径的情况下运行使用其他文件夹和文件的 python 文件? - How can I run a python file that uses other folders with files, without using the computer's path? 如何通过手机向计算机输入信息? - How can I give input to my computer from my phone? 如何在不复制最后一个输入的情况下打印我的代码? - How can I print my code without duplicating the last input? 如何使用 python 读取保存在计算机中的文本文件 - how to i read a text file save in my computer with python 如何在不打开计算机电源的情况下保持discord python bot的运行? - How can I keep discord python bot running without keeping my computer on? 我的python代码运行了几次,但是一旦我关闭计算机或执行其他操作,它就不会再次运行 - My python code runs a few times but as soon as I close my computer or do something else, it doesn't run again 如何使计算机读取python文件而不是py? - How can i make computer read a python file instead of py?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM