I'm attempting to make a Flask web application where you have to request the entirety of a non-local website and I was wondering if it was possible to cache it for the purposes of speeding things up, because the website does not change that often but I still want it to update the cache once a day or so.
Anyway, I looked it up and found Flask-Cache, which seemed to do what I wanted so I made appropriate changes to it, and came up with adding this:
from flask.ext.cache import Cache
[...]
cache = Cache()
[...]
cache.init_app(app)
[...]
@cache.cached(timeout=86400, key_prefix='content')
def get_content():
return lxml.html.fromstring(urllib2.urlopen('http://WEBSITE.com').read())
and then I make a call from the functions that need the content to proceed like so:
content = get_content()
Now I'd expect it to reuse the cached lxml.html object everytime a call is made, but that's not what I'm seeing. The id of the object changes every time a call is made and there's no speed-up at all. So have I misunderstood what Flask-Cache does, or am I doing something wrong here? I've tried using the memoize decorator instead, I've tried decreasing the timeout or removing it all together but nothing seems to be making anything difference.
Thanks.
The default CACHE_TYPE
is null
which gives you a NullCache
- so you get no caching at all which is what you observe. The documentation does not make this explicit, but this line in the source of Cache.init_app
does:
self.config.setdefault('CACHE_TYPE', 'null')
To actually employ some caching, initialise your Cache
instance to use a proper cache.
cache = Cache(config={'CACHE_TYPE': 'simple'})
Aside: Note that SimpleCache
is great for development and testing, and this example, but you shouldn't use it in production. Something like MemCached
or RedisCache
would be much better
Now, with an actual cache in place, you will run into the next problem. On the second call, the cached lxml.html
object will be retrieved from the Cache
, but it is broken because these objects are not cacheable. Stacktrace looks like this:
Traceback (most recent call last):
File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1701, in __call__
return self.wsgi_app(environ, start_response)
File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1689, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1687, in wsgi_app
response = self.full_dispatch_request()
File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1360, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1358, in full_dispatch_request
rv = self.dispatch_request()
File "/home/day/.virtualenvs/so-flask/lib/python2.7/site-packages/flask/app.py", line 1344, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/day/q12030403.py", line 20, in index
return "get_content returned: {0!r}\n".format(get_content())
File "lxml.etree.pyx", line 1034, in lxml.etree._Element.__repr__ (src/lxml/lxml.etree.c:41389)
File "lxml.etree.pyx", line 881, in lxml.etree._Element.tag.__get__ (src/lxml/lxml.etree.c:39979)
File "apihelpers.pxi", line 15, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:12306)
AssertionError: invalid Element proxy at 3056741852
So instead of caching the lxml.html
object, you should just cache the simple string - the content of the website that you downloaded, and then reparse that to get a fresh lxml.html
object every time. Your cache still helps as you don't hit the other website every time. Here is a full program to demonstrate that solution which works:
from flask import Flask
from flask.ext.cache import Cache
import time
import lxml.html
import urllib2
app = Flask(__name__)
cache = Cache(config={'CACHE_TYPE': 'simple'})
cache.init_app(app)
@cache.cached(timeout=86400, key_prefix='content')
def get_content():
app.logger.debug("get_content called")
# return lxml.html.fromstring(urllib2.urlopen('http://daybarr.com/wishlist').read())
return urllib2.urlopen('http://daybarr.com/wishlist').read()
@app.route("/")
def index():
app.logger.debug("index called")
return "get_content returned: {0!r}\n".format(get_content())
if __name__ == "__main__":
app.run(debug=True)
When I run the program, and make two requests to http://127.0.0.1:5000/
, I get this output. Note that get_content
is not called the second time, because the content is served from cache.
* Running on http://127.0.0.1:5000/
* Restarting with reloader
--------------------------------------------------------------------------------
DEBUG in q12030403 [q12030403.py:20]:
index called
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in q12030403 [q12030403.py:14]:
get_content called
--------------------------------------------------------------------------------
127.0.0.1 - - [21/Dec/2012 00:03:28] "GET / HTTP/1.1" 200 -
--------------------------------------------------------------------------------
DEBUG in q12030403 [q12030403.py:20]:
index called
--------------------------------------------------------------------------------
127.0.0.1 - - [21/Dec/2012 00:03:33] "GET / HTTP/1.1" 200 -
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.