简体   繁体   English

使用python twisted写一个web爬虫

[英]Writing a web crawler using python twisted

I'm using Twisted to write a web crawler driven with Selenium . 我正在使用Twisted编写一个用Selenium驱动的网络爬虫。 The idea is that I spawn twisted threads for a twisted client and a twisted server that will proxy HTTP requests to the server. 我的想法是,我为扭曲的客户端和扭曲的服务器生成扭曲的线程,该服务器将HTTP请求代理到服务器。 Something that looks like this: 看起来像这样的东西:

    +--------+       +--------+
    |        +------>+        |
 -->| Client |       | Server |---> WWW
    |        +<------+        |
    +--------+       +--------+

All this is running in the same process, though. 但是,所有这些都在同一个过程中运行。 The question is whether twisted allows this kind of applications or is only thought to run Client and Server as different processes (as this is the typical case I've seen everywhere). 问题是twisted是否允许这种应用程序,或者只被认为是将ClientServer作为不同的进程运行(因为这是我到处看到的典型情况)。

You can't use Twisted in a thread. 你不能在一个线程中使用Twisted。 You can, however, make a single Twisted thread which can happily make multiple clients and servers. 但是,您可以创建一个可以愉快地创建多个客户端和服务器的Twisted线程。

You may need to describe your problem in a bit more detail for a better answer than that. 您可能需要更详细地描述您的问题,以获得更好的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM