如何开始从python的顶部读取文件？

Question

I am trying to add dependencies from a list to a requirements.txt file depending on the platform the software is going to run. 我试图根据软件要运行的平台，将列表中的依赖项添加到requirements.txt文件中。 So I wrote the following code: 所以我写了下面的代码：

if platform.system() == 'Windows':
    # Add windows only requirements
    platform_specific_req = [req1, req2]
elif platform.system() == 'Linux':
    # Add linux only requirements
    platform_specific_req = [req3]

with open('requirements.txt', 'a+') as file_handler:
    for requirement in platform_specific_req:
        already_in_file = False
        # Make sure the requirement is not already in the file
        for line in file_handler.readlines():
            line = line.rstrip()  # remove '\n' at end of line
            if line == requirement:
                already_in_file = True
                break
        if not already_in_file:
            file_handler.write('{0}\n'.format(requirement))
    file_handler.close()

But what is happening with this code is that when the second requirement is going to be searched in the list of requirements already in the file, the for line in file_handler.readlines(): seems to be pointing to the last element of the list in the file so the new requirement is actually only compared to the last element in the list, and if it is not the same one it gets added. 但是这段代码正在发生的事情是，当要在文件中已经存在的需求列表中搜索第二个需求时for line in file_handler.readlines():的for line in file_handler.readlines():似乎指向了该列表中的最后一个元素该文件，因此实际上仅将新需求与列表中的最后一个元素进行比较，如果不相同，则将其添加。 Obviously this is causing several elements to be duplicated in the list, since only the first requirement is being compared against all the elements in the list. 显然，这导致列表中的多个元素重复，因为仅将第一个要求与列表中的所有元素进行比较。 How can I tell python to start comparing from the top of the file again? 如何告诉python从文件顶部再次开始比较？

Solution: I received many great responses, I learned a lot, thanks Guys. 解决方案：谢谢你们，我收到了很多好评，我学到了很多东西。 I ended up combining two solutions; 我最终结合了两种解决方案； the one from Antti Haapala and the one from Matthew Franglen into one. 来自Antti Haapala的一本和来自Matthew Franglen的一本。 I am showing the final code here for reference: 我在这里显示最终代码以供参考：

# Append the extra requirements to the requirements.txt file
with open('requirements.txt', 'r') as file_in:
    reqs_in_file = set([line.rstrip() for line in file_in])
    missing_reqs = set(platform_specific_reqs).difference(reqs_in_file)

with open('requirements.txt', 'a') as file_out:
    for req in missing_reqs:
        file_out.write('{0}\n'.format(req))

Answer 1

You open the file handle before iterating over the existing requirement list. 在遍历现有需求列表之前，请先打开文件句柄。 You then read the entire file handle for each requirement. 然后，您将阅读每个要求的整个文件句柄。

The file handle will finish after the first requirement because you have not reopened it. 文件句柄将在第一个要求之后完成，因为您尚未重新打开它。 Reopening the file for each iteration would be very wasteful - read the file into a list and then use that inside the loops. 每次迭代重新打开文件都是非常浪费的-将文件读入列表，然后在循环中使用它。 Or do a set comparison! 或做一组比较！

file_content = set([line.rstrip() for line in file_handler])
only_in_platform = set(platform_specific_req).difference(file_content)

Answer 2

Do not try to read the file again for each requirement. 不要尝试针对每个要求再次读取文件。 While appending does work for this very use case, for modifications in general it is easier to just: 虽然追加确实可以满足这种用例，但一般而言，修改起来更容易：

Read the content from the file into a list (preferably skipping empty lines) 将文件中的内容读取到列表中（最好跳过空行）
Modify the list 修改清单
Open the file again for writing and save the modified data. 再次打开文件进行写入并保存修改后的数据。

So for example 所以举个例子

with open('requirements.txt', 'r') as fin:
    requirements = [ i for i in (line.strip() for line in fin) if i ]

for req in platform_specific_req:
    if req not in requirements:
        requirements.append(req)

with open('requirements.txt', 'w') as fout:
    for req in requirements:
        fout.write('{0}\n'.format(req))
        # or print(req, file=fout)

Answer 3

The answer to your explicit question: file_handler.seek(0) will seek it back to the beginning of the file. 您的明确问题的答案：file_handler.seek（0）会将其找回文件的开头。

Some neat improvements: 一些巧妙的改进：

You can use the file handler itself as an iterator instead of calling the readlines() method. 您可以将文件处理程序本身用作迭代器，而不用调用readlines（）方法。

If your file is too large to read entirely in to memory, then iterating over the lines in the file directly is fine - but you should change how you're doing it. 如果您的文件太大而无法完全读入内存，那么直接遍历文件中的行就可以了-但是您应该更改操作方式。 As is, you're iterating over the entire file for each requirement, but IO is costly. 照原样，您正在为每个需求遍历整个文件，但是IO成本很高。 You should probably iterate over the lines, and for each line check if it's one of the requirements. 您可能应该遍历各行，并针对每一行检查它是否是要求之一。 Like so: 像这样：

with open('requirements.txt', 'a+') as file_handler:
   for line in file_handler:
      line = line.rstrip()
      if line in platform_specific_req:
         platform_specific_req.remove(line)
   for req in platform_specific_req:
      file_handler.write('{0}\n'.format(req))

Answer 4

I know I'm answering a little late, but I would suggest doing it this way, opening it once, reading and appending in the same go. 我知道我要晚回答，但是我建议这样做，打开一次，阅读并追加。 Note this should work on every platform regardless of your system: 请注意，无论您使用什么系统，此方法都可以在每个平台上运行：

import os

def ensure_in_file(lines, file_path):
    '''
    idempotent function to append lines to a file if they're not already there
    '''
    with open(file_path, 'r+U') as f: # r+U allows append, Universal Newline mode
        # set of all lines in the file, less newlines, and trailing spaces too.
        file_lines = set(l.rstrip() for l in f)
        # write lines not in the file, add the os line separator as you go
        f.writelines(l + os.linesep for l in set(lines).difference(file_lines))

You can test this 你可以测试一下

a_file = '/temp/temp/foo/bar' # insert your own file path here.

# with open(a_file, 'w') as f:  # ensure a blank file
    # pass 
ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f: 
    print f.read()

ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f:
    print f.read()

Each print statement should demonstrate that the file has each line once. 每个打印语句应证明文件每行一次。

如何开始从python的顶部读取文件？

问题描述

4 个解决方案

解决方案1
1 已采纳 2014-06-19 19:46:02

解决方案2
1 2014-06-19 19:46:19

解决方案3
1 2014-06-19 19:54:58

解决方案4
0 2014-06-19 20:56:35

如何开始从python的顶部读取文件？

问题描述

4 个解决方案

解决方案1 1 已采纳 2014-06-19 19:46:02

解决方案2 1 2014-06-19 19:46:19

解决方案3 1 2014-06-19 19:54:58

解决方案4 0 2014-06-19 20:56:35

解决方案1
1 已采纳 2014-06-19 19:46:02

解决方案2
1 2014-06-19 19:46:19

解决方案3
1 2014-06-19 19:54:58

解决方案4
0 2014-06-19 20:56:35