简体   繁体   中英

Python twisted web server caching and executing outdated code

Background: Working on a web application that allows users to upload python scripts to a server (Twisted web server). The UI provides full CRUD functionality on these python scripts. After uploading a script the user can then select the script and run it on the server and get results back on the UI. Everything works fine...

Problem: ...except when the user edits the python code inline (via the UI) or updates a script by uploading a new script overwriting one which already exists. It seems that twisted caches the code (both old and new) and runs new code sometimes and sometimes runs the old code.

Example: I upload a script hello.py on the server which has a function called run() which does: print 'hello world' . Someone else comes along and uploads another script named hello.py which does: print 'goodbye world' . Then, I go back and execute the run() function on the script 10 times. Half of the times it will say 'hello world' and half of the times it will say 'goodbye world'.

Tried so far: Several different ways to reload the script into memory before executing it, including:

  • python's builtin reload():

     module = __import__('hello') reload(module) module.run() 
  • imp module reload():

     import imp module = __import__('hello') imp.reload(module) module.run() 
  • twisted.python.rebuild()

     from twisted.python.rebuild import rebuild module = __import__('hello') rebuild(module) module.run() 
  • figured that perhaps if we force python to not write bytecode, that would solve the issue: sys.dont_write_bytecode = True

  • restart twisted server

  • a number of other things which I can't remember

And the only way to make sure that the most up to date python code executes is to restart twisted server manually. I have been researching for quite some time and have not found any better way of doing it, which works 100% of the time. This leads me to believe that bouncing twisted is the only way.

Question: Is there a better way to accomplish this (ie always execute the most recent code) without having to bounce twisted? Perhaps by preventing twisted from caching scripts into memory, or by clearing twisted cache before importing/reloading modules.

I'm fairly new to twisted web server, so it's possible that I may have overlooked obvious way to resolve this issue, or may have a completely wrong way of approaching this. Some insight into solving this issue would be greatly appreciated.

Thanks

T

Twisted doesn't cache Python code in memory. Python's module system works by evaluating source files once and then placing a module object into sys.modules . Future imports of the module do not re-evaluate the source files - they just pull the module object from sys.modules .

What parts of Twisted will do is keep references to objects that it is using. This is just how you write Python programs. If you don't have references to objects, you can't use them. The Twisted Web server can't call the run function unless it has a reference to the module that defines that function.

The trouble with reload is that it re-evaluates the source file defining the module but it can't track down and replace all of the references to the old version of the objects that module defined - for example, your run function. The imp.reload function is essentially the same.

twisted.python.rebuild tries to address this problem but using it correctly takes some care (and more likely than not there are edge cases that it still doesn't handle properly).

Whether any of these code reloading tools will work in your application or not is extremely sensitive to the minute, seemingly irrelevant details of how your application is written.

For example,

import somemodule
reload(somemodule)
somemodule.foo()

can be expected to run the newest version of somemodule.foo . But...

from somemodule import foo
import somemodule
reload(somemodule)
foo()

Can be expected not to run the newest version of somemodule.foo . There are even more subtle rules for using twisted.python.rebuild successfully.

Since your question doesn't include any of the actual code from your application, there's no way to know which of these cases you've run into (resulting in the inability to reliably update your objects to reflect the latest version of their source code).

There aren't any great solutions here. The solution that works the most reliably is to restart the process. This certainly clears out any old code/objects and lets things run with the newest version (though not 100% of the time - for example, timestamp problems on .py and .pyc files can result in an older .pyc file being used instead of a new .py file - but this is pretty rare).

Another approach is to use execfile (or exec ) instead of import . This bypasses the entire module system (and therefore its layer of "caching"). It puts the entire burden of managing the lifetime of the objects defined by the source you're loading onto you. It's more work but it also means there are few surprises coming from other levels of the runtime.

And of course it is possible to do this with reload or twisted.python.rebuild if you're willing to go through all of your code for interacting with user modules and carefully audit it for left-over references to old objects. Oh, and any library code you're using that might have been able to get a reference to those objects, too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM