简体   繁体   English

Python:如何一次浏览文件中的多行?

[英]Python: How do I go through multiple lines in a file at once?

I have created a class which loops through a file and after checking if a line is valid, it'll write that line to another file. 我创建了一个循环遍历文件的类,在检查一行是否有效后,它会将该行写入另一个文件。 Every line it checks is a lengthy process making it very slow. 它检查的每一行都是一个漫长的过程,使得它很慢。 I need to implement either threading/multiprocessing at the process_file function; 我需要在process_file函数中实现线程/多处理; I do not know which library is best suited for speeding this function up or how to implement it. 我不知道哪个库最适合加速此功能或如何实现它。

class FileProcessor:
    def process_file(self):
        with open('file.txt', 'r') as f:
            with open('outfile.txt', 'w') as output:
                for line in f:
                    # There's some string manipulation code here...
                    validate = FileProcessor.do_stuff(self, line)
                    # If true write line to output.txt
    def do_stuff(self, line)
        # Does stuff...
        pass 

Extra Information: The code goes through a proxy list checking whether it is online. 额外信息:代码通过代理列表检查是否在线。 This is a lengthy and time consuming process. 这是一个漫长而耗时的过程。

Thank you for any insight or help! 感谢您的任何见解或帮助!

The code goes through a proxy list checking whether it is online 代码通过代理列表检查它是否在线

It sounds like what takes a long time is connecting to the internet, meaning your task is IO bound and thus threads can help speed it up. 这听起来像连接到互联网需要很长时间,这意味着你的任务是IO绑定的,因此线程可以帮助加快它。 Multiple processes are always applicable but can be harder to use. 多个过程始终适用,但可能更难使用。

This seems like a job for multiprocessing.map . 这似乎是multiprocessing.map的工作。

import multiprocessing

def process_file(filename):
    pool = multiprocessing.Pool(4)
    with open(filename) as fd:
        results = pool.imap_unordered(do_stuff, (line for line in fd))
        with open("output.txt", "w") as fd:
            for r in results:
                fd.write(r)

def do_stuff(item):
    return "I did something with %s\n" % item

process_file(__file__)

You can also use multiprocessing.dummy.Pool instead if you want to use threads (which might be preferable in this case since your are I/O bound). 如果你想使用线程(在这种情况下可能更好,因为你的I / O绑定),你也可以使用multiprocessing.dummy.Pool

Essentially you are passing an iterable to imap_unordered (or imap if order matters) and farming out portions of it to other processes (or threads if using dummy). 基本上你将一个iterable传递给imap_unordered (如果顺序很重要, imap_unordered imap传递给imap_unordered ),并将其中的一部分传递给其他进程(如果使用dummy,则将其转换为线程)。 You can tune the chunksize of the map to help with efficiency. 您可以调整chunksize地图,以帮助效率。

If you want to encapsulate this into a class, you'll need to use multiprocessing.dummy . 如果要将其封装到类中,则需要使用multiprocessing.dummy (Otherwise it can't pickle the instance method.) (否则它无法pickle实例方法。)

You do have to wait until the map finishes before you can process the results, although you could write the results in do_stuff instead -- just be sure to open the file in append mode, and you'll likely want to lock the file . 您必须等到地图完成才能处理结果,尽管您可以在do_stuff编写结果 - 只需确保以附加模式打开文件,您可能想要锁定文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中多次浏览文件 - How to go through file multiple times in python 如何在文件中搜索字符串并将其替换为Python中的多行? - How do I search a file for a string and replace it with multiple lines in Python? 如何将文件中的多行数据读入python? - How do I read multiple lines of data from a file into python? Python一次写入多行文件 - Python writing to file multiple lines at once 如何一次通过多个 csv 传递多个变量? - how do I pass multiple variables through multiple csv at once? 如何遍历 glob.glob 中的文件路径以一次创建多个文件? - How do I loop through a file path in glob.glob to create multiple files at once? 如何在python中循环浏览csv,将行写入符合新条件的新文件 - How do I cycle through a csv in python, writing lines to a new file that meet new criteria 如何在 python 3.8 中的文本文件的所有行中循环“for”循环? - How do I make the “for” loop cycle through all lines of a text file in python 3.8? 如何通过多重处理绘制多条线? - How do I plot multiple lines through multiprocessing? 如何一次对 Python 中的多个图像执行 OCR,并将所有数据打印到 XLSX 文件中? - How do I perform OCR on multiple images in Python at once, and print all that data into an XLSX file?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM