[英]Program only writes to file on runs after the first one
The program checks if a url results in a 404, and if it does it writes a username to a file. 该程序将检查url是否导致404,如果执行则将用户名写入文件。 I tried to add multiprocessing so that the program would run faster, as there were times where I would have input text files with 1000's of lines and it would take quite a while.
我尝试添加多处理程序,以使程序运行更快,因为有时我会输入带有1000行的文本文件,这将花费相当长的时间。 However, the first time I run this program (when the output text file is empty), it doesn't write anything to the output text file.
但是,我第一次运行该程序时(当输出文本文件为空时),它不会向输出文本文件写入任何内容。 It only begins to write to the output file on the 2nd, 3rd, 4th ... run.
它仅在第二,第三,第四...运行时才开始写入输出文件。
#program checks twitch accounts in a file.
#writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool
x = "0"
accounts = open('accounts.txt', 'r')
valid_accounts = open('valid accounts.txt', 'a')
base_url = "https://www.twitch.tv/"
def check(x):
for line in accounts:
url = base_url + line
twitch_r = requests.get(url)
if twitch_r.status_code == 404:
valid_accounts.write(line + "\n")
def Main():
p = Pool(processes=25)
p.imap(check, x)
accounts.close()
valid_accounts.close()
if __name__ == "__main__":
Main()
你应该叫p.close()
然后p.join()
在结束Main()
您没有将帐户传递到泳池地图
p.imap(check, accounts)
Your main problem is that you use imap
instead of map
. 您的主要问题是使用
imap
而不是map
。 imap
is nonblocking, that means that your main process quits before the processes run through. imap
是非阻塞的,这意味着您的主进程会在进程运行之前退出。 I'm a bit suprised that it worked sometimes as I think it should have worked never . 我有点惊讶它有时会起作用,因为我认为它永远都不会起作用。
That said, there are a few problems with your program: 也就是说,您的程序存在一些问题:
map
and imap
are thought to run on a list of arguments and then return the result (mapped value) map
和imap
被认为在参数列表上运行,然后返回结果(映射值) processes=20
so python can find out the best number of processes based on how many cores your computer has processes=20
因此python可以根据您的计算机拥有多少个内核找出最佳进程数 Based on these things, that's the code I propose: 基于这些内容,这就是我建议的代码:
# program checks twitch accounts in a file.
# writes accounts which aren't taken to another file.
import requests
from multiprocessing import Pool, Queue
base_url = "https://www.twitch.tv/"
def check(line):
twitch_r = requests.get(base_url + line)
if twitch_r.status_code == 404:
return line
def Main():
queue_in = Queue()
queue_out = Queue()
p = Pool()
with open('accounts.txt', 'r') as accounts:
lines = accounts.readlines()
results = p.map(check, lines)
results = [r for r in results if r != None]
with open('valid accounts.txt', 'a') as valid_accounts:
for result in results:
valid_accounts.write(result)
if __name__ == "__main__":
Main()
The only thing to be noted is that you need to strip out the None
in results
because check(line)
returns None
for all the urls which result is not a 404
. 需要注意的唯一的事情是,你需要剥离出
None
以results
因为check(line)
返回None
所有这结果不是网址404
。
Updates : 更新内容 :
After using John's solution, the program is working as intended
使用约翰的解决方案后,该程序可以按预期工作
I doubt it does. 我对此表示怀疑。 Since you are on windows, every process has it's own filehandler pointing to
accounts.txt
and will cycle through all the lines. 因为您在Windows上,所以每个进程都具有指向
accounts.txt
的自己的文件处理程序,并将在所有行中循环。 So you end up checking every url 20 times and the multiprocessing didn't help you 因此,您最终检查了每个URL 20次,而多处理并没有帮助您
I used imap because I read that imap doesn't return a list (?)
我使用imap是因为我读到imap不会返回列表(?)
No. The difference of map vs. imap in this situation is only that map waits until all processes are done (thus, you don't need to call join
). 否。在这种情况下,map与imap的区别仅在于map等待所有进程完成(因此,您无需调用
join
)。
For a more thorough discussion about map vs imap see here 有关map vs imap的更详尽讨论, 请参见此处
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.