简体   繁体   中英

What are the Tornado and Mongodb blocking and asynchronous considerations?

I am running the Tornado web server in conjunction with Mongodb (using the pymongo driver). I am trying to make architectural decisions to maximize performance.

I have several subquestions regarding the blocking/non-blocking and asynchronous aspects of the resulting application when using Tornado and pymongo together:

Question 1: Connection Pools

It appears that the pymongo.mongo_client.MongoClient object automatically implements a pool of connections. Is the intended purpose of a "connection pool" so that I can access mongodb simultaneously from different threads? Is it true that if run with a single MongoClient instance from a single thread that there is really no "pool" since there would only be one connection open at any time?

Question 2: Multi-threaded Mongo Calls

The following FAQ:

http://api.mongodb.org/python/current/faq.html#does-pymongo-support-asynchronous-frameworks-like-gevent-tornado-or-twisted

states:

Currently there is no great way to use PyMongo in conjunction with Tornado or Twisted. PyMongo provides built-in connection pooling, so some of the benefits of those frameworks can be achieved just by writing multi-threaded code that shares a MongoClient.

So I assume that I just pass a single MongoClient reference to each thread? Or is there more to it than that? What is the best way to trigger a callback when each thread produces a result? Should I have one thread running who's job it is to watch a queue (python's Queue.Queue ) to handle each result and then calling finish() on the left open RequestHandler object in Tornado? (of course using the tornado.web.asynchronous decorator would be needed)

Question 3: Multiple Instances

Finally, is it possible that I am just creating work? Should I just shortcut things by running a single threaded instance of Tornado and then start 3-4 instances per core? (The above FAQ reference seems to suggest this)

After all doesn't the GIL in python result in effectively different processes anyway? Or are there additional performance considerations (plus or minus) by the "non-blocking" aspects of Tornado? (I know that this is non-blocking in terms of I/O as pointed out here: Is Tornado really non-blocking? )

(Additional Note: I am aware of asyncmongo at: https://github.com/bitly/asyncmongo but want to use pymongo directly and not introduce this additional dependency.)

As i understand, there is two concepts of webservers:

  1. Thread Based (apache)
  2. Event Driven (tornado)

And you've the GIL with python, GIL is not good with threads, and event driven is a model that uses only one thread, so go with event driven.

Pymongo will block tornado, so here is suggestions:

  1. Using Pymongo: use it, and make your database calls faster, by making indexes, but be aware; indexes dont work with operation that will scan lot of values for example: gte
  2. Using AsyncMongo , it seems that has been updated, but still not all mongodb features.
  3. Using Mongotor , this one is a like an update for Asynchmongo, and it has ODM (Object Document Mapper), has all what you need from MongoDB (aggregation, replica set..) and the only feature that you really miss is GridFS.
  4. Using Motor , this is one, is the complete solution to use with Tornado, it has GridFS support, and it is the officialy Mongodb asynchronous driver for Tornado, it uses a hack using Greenlet , so the only downside is not to use with PyPy.

And now, if you decide other solution than Tornado, if you use Gevent, then you can use Pymongo, because it is said :

The only async framework that PyMongo fully supports is Gevent.

NB: sorry if going out of topic, but the sentence:

Currently there is no great way to use PyMongo in conjunction with Tornado

should be dropped from the documentation, Mongotor and Motor works in a perfect manner (Motor in particular).

are you also aware of motor ? : http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/ it is written by Jesse Davis who is coauthor of pymongo

While the question is old, I felt the answers given don't completely address all the queries asked by the user.

Is it true that if run with a single MongoClient instance from a single thread that there is really no "pool" since there would only be one connection open at any time?

This is correct if your script does not use threading. However if your script is multi-threaded then there would be multiple connections open at a given time

Finally, is it possible that I am just creating work? Should I just shortcut things by running a single threaded instance of Tornado and then start 3-4 instances per core?

No you are not! creating multiple threads is less resource intensive than multiple forks.

After all doesn't the GIL in python result in effectively different processes anyway?

The GIL only prevents multiple threads from accessing the interpreter at the same time. Does not prevent multiple threads from carrying out I/O simultaneously. In fact that's exactly how motor with asyncio achieves asynchronicity.

It uses a thread pool executor to spawn a new thread for each query, returns the result when the thread completes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM