简体   繁体   English

程序仅在第一个运行后运行时才写入文件

[英]Program only writes to file on runs after the first one

The program checks if a url results in a 404, and if it does it writes a username to a file. 该程序将检查url是否导致404,如果执行则将用户名写入文件。 I tried to add multiprocessing so that the program would run faster, as there were times where I would have input text files with 1000's of lines and it would take quite a while. 我尝试添加多处理程序,以使程序运行更快,因为有时我会输入带有1000行的文本文件,这将花费相当长的时间。 However, the first time I run this program (when the output text file is empty), it doesn't write anything to the output text file. 但是,我第一次运行该程序时(当输出文本文件为空时),它不会向输出文本文件写入任何内容。 It only begins to write to the output file on the 2nd, 3rd, 4th ... run. 它仅在第二,第三,第四...运行时才开始写入输出文件。

#program checks twitch accounts in a file.
#writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool
x = "0"

accounts = open('accounts.txt', 'r')
valid_accounts = open('valid accounts.txt', 'a')

base_url = "https://www.twitch.tv/"

def check(x):
    for line in accounts:
        url = base_url + line
        twitch_r = requests.get(url)
        if twitch_r.status_code == 404:
            valid_accounts.write(line + "\n")



def Main():
    p = Pool(processes=25)
    p.imap(check, x)
    accounts.close()
    valid_accounts.close()



if __name__ == "__main__":
    Main()

你应该叫p.close()然后p.join()在结束Main()

您没有将帐户传递到泳池地图

p.imap(check, accounts)

Your main problem is that you use imap instead of map . 您的主要问题是使用imap而不是map imap is nonblocking, that means that your main process quits before the processes run through. imap是非阻塞的,这意味着您的主进程会在进程运行之前退出。 I'm a bit suprised that it worked sometimes as I think it should have worked never . 我有点惊讶它有时会起作用,因为我认为它永远不会起作用。

That said, there are a few problems with your program: 也就是说,您的程序存在一些问题:

  • the check method, running in different processes, shares one file handler and iterates over the lines. 在不同进程中运行的check方法共享一个文件处理程序,并遍历各行。 This is just working by chance (it will not work on Windows, for instance) and is bad practice (to put it midly). 这只是偶然的结果(例如,在Windows上将不起作用),并且是不好的做法(将其放在中间)。 You should read the file first and then distribute the lines to the processes 您应该先阅读文件,然后将行分配给进程
  • same thing applies with writing to the file. 写入文件同样适用。 Although appending to a file is safe to do also across processes, a better design would be to put that at the end into the parent process 尽管在整个流程中也可以安全地附加到文件,但最好的设计是将其放在父流程的最后
  • map and imap are thought to run on a list of arguments and then return the result (mapped value) mapimap被认为在参数列表上运行,然后返回结果(映射值)
  • leave out the processes=20 so python can find out the best number of processes based on how many cores your computer has 省略了processes=20因此python可以根据您的计算机拥有多少个内核找出最佳进程数

Based on these things, that's the code I propose: 基于这些内容,这就是我建议的代码:

# program checks twitch accounts in a file.
# writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool, Queue

base_url = "https://www.twitch.tv/"

def check(line):
    twitch_r = requests.get(base_url + line)
    if twitch_r.status_code == 404:
        return line

def Main():
    queue_in = Queue()
    queue_out = Queue()
    p = Pool()

    with open('accounts.txt', 'r') as accounts:
        lines = accounts.readlines()

    results = p.map(check, lines)
    results = [r for r in results if r != None]
    with open('valid accounts.txt', 'a') as valid_accounts:
        for result in results:
            valid_accounts.write(result)

if __name__ == "__main__":
    Main()

The only thing to be noted is that you need to strip out the None in results because check(line) returns None for all the urls which result is not a 404 . 需要注意的唯一的事情是,你需要剥离出Noneresults因为check(line)返回None所有这结果不是网址404

Updates : 更新内容

After using John's solution, the program is working as intended 使用约翰的解决方案后,该程序可以按预期工作

I doubt it does. 我对此表示怀疑。 Since you are on windows, every process has it's own filehandler pointing to accounts.txt and will cycle through all the lines. 因为您在Windows上,所以每个进程都具有指向accounts.txt的自己的文件处理程序,并将在所有行中循环。 So you end up checking every url 20 times and the multiprocessing didn't help you 因此,您最终检查了每个URL 20次,而多处理并没有帮助您

I used imap because I read that imap doesn't return a list (?) 我使用imap是因为我读到imap不会返回列表(?)

No. The difference of map vs. imap in this situation is only that map waits until all processes are done (thus, you don't need to call join ). 否。在这种情况下,map与imap的区别仅在于map等待所有进程完成(因此,您无需调用join )。

For a more thorough discussion about map vs imap see here 有关map vs imap的更详尽讨论, 请参见此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM