如何开始从python的顶部读取文件？

Question

我试图根据软件要运行的平台，将列表中的依赖项添加到requirements.txt文件中。 所以我写了下面的代码：

if platform.system() == 'Windows':
    # Add windows only requirements
    platform_specific_req = [req1, req2]
elif platform.system() == 'Linux':
    # Add linux only requirements
    platform_specific_req = [req3]

with open('requirements.txt', 'a+') as file_handler:
    for requirement in platform_specific_req:
        already_in_file = False
        # Make sure the requirement is not already in the file
        for line in file_handler.readlines():
            line = line.rstrip()  # remove '\n' at end of line
            if line == requirement:
                already_in_file = True
                break
        if not already_in_file:
            file_handler.write('{0}\n'.format(requirement))
    file_handler.close()

但是这段代码正在发生的事情是，当要在文件中已经存在的需求列表中搜索第二个需求时for line in file_handler.readlines():的for line in file_handler.readlines():似乎指向了该列表中的最后一个元素该文件，因此实际上仅将新需求与列表中的最后一个元素进行比较，如果不相同，则将其添加。 显然，这导致列表中的多个元素重复，因为仅将第一个要求与列表中的所有元素进行比较。 如何告诉python从文件顶部再次开始比较？

解决方案：谢谢你们，我收到了很多好评，我学到了很多东西。 我最终结合了两种解决方案； 来自Antti Haapala的一本和来自Matthew Franglen的一本。 我在这里显示最终代码以供参考：

# Append the extra requirements to the requirements.txt file
with open('requirements.txt', 'r') as file_in:
    reqs_in_file = set([line.rstrip() for line in file_in])
    missing_reqs = set(platform_specific_reqs).difference(reqs_in_file)

with open('requirements.txt', 'a') as file_out:
    for req in missing_reqs:
        file_out.write('{0}\n'.format(req))

Answer 1

在遍历现有需求列表之前，请先打开文件句柄。 然后，您将阅读每个要求的整个文件句柄。

文件句柄将在第一个要求之后完成，因为您尚未重新打开它。 每次迭代重新打开文件都是非常浪费的-将文件读入列表，然后在循环中使用它。 或做一组比较！

file_content = set([line.rstrip() for line in file_handler])
only_in_platform = set(platform_specific_req).difference(file_content)

Answer 2

不要尝试针对每个要求再次读取文件。 虽然追加确实可以满足这种用例，但一般而言，修改起来更容易：

将文件中的内容读取到列表中（最好跳过空行）
修改清单
再次打开文件进行写入并保存修改后的数据。

所以举个例子

with open('requirements.txt', 'r') as fin:
    requirements = [ i for i in (line.strip() for line in fin) if i ]

for req in platform_specific_req:
    if req not in requirements:
        requirements.append(req)

with open('requirements.txt', 'w') as fout:
    for req in requirements:
        fout.write('{0}\n'.format(req))
        # or print(req, file=fout)

Answer 3

您的明确问题的答案：file_handler.seek（0）会将其找回文件的开头。

一些巧妙的改进：

您可以将文件处理程序本身用作迭代器，而不用调用readlines（）方法。

如果您的文件太大而无法完全读入内存，那么直接遍历文件中的行就可以了-但是您应该更改操作方式。 照原样，您正在为每个需求遍历整个文件，但是IO成本很高。 您可能应该遍历各行，并针对每一行检查它是否是要求之一。 像这样：

with open('requirements.txt', 'a+') as file_handler:
   for line in file_handler:
      line = line.rstrip()
      if line in platform_specific_req:
         platform_specific_req.remove(line)
   for req in platform_specific_req:
      file_handler.write('{0}\n'.format(req))

Answer 4

我知道我要晚回答，但是我建议这样做，打开一次，阅读并追加。 请注意，无论您使用什么系统，此方法都可以在每个平台上运行：

import os

def ensure_in_file(lines, file_path):
    '''
    idempotent function to append lines to a file if they're not already there
    '''
    with open(file_path, 'r+U') as f: # r+U allows append, Universal Newline mode
        # set of all lines in the file, less newlines, and trailing spaces too.
        file_lines = set(l.rstrip() for l in f)
        # write lines not in the file, add the os line separator as you go
        f.writelines(l + os.linesep for l in set(lines).difference(file_lines))

你可以测试一下

a_file = '/temp/temp/foo/bar' # insert your own file path here.

# with open(a_file, 'w') as f:  # ensure a blank file
    # pass 
ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f: 
    print f.read()

ensure_in_file(['this', 'that'], a_file)
with open(a_file, 'rU') as f:
    print f.read()

每个打印语句应证明文件每行一次。

如何开始从python的顶部读取文件？

问题描述

4 个解决方案

解决方案1
1 已采纳 2014-06-19 19:46:02

解决方案2
1 2014-06-19 19:46:19

解决方案3
1 2014-06-19 19:54:58

解决方案4
0 2014-06-19 20:56:35

如何开始从python的顶部读取文件？

问题描述

4 个解决方案

解决方案1 1 已采纳 2014-06-19 19:46:02

解决方案2 1 2014-06-19 19:46:19

解决方案3 1 2014-06-19 19:54:58

解决方案4 0 2014-06-19 20:56:35

解决方案1
1 已采纳 2014-06-19 19:46:02

解决方案2
1 2014-06-19 19:46:19

解决方案3
1 2014-06-19 19:54:58

解决方案4
0 2014-06-19 20:56:35