简体   繁体   中英

GAE Python LXML - Exceeded soft private memory limit

I am fetching a GZipped LXML file and trying to write Product entries to a Databse Model. Previously I was having local memory issues, which were resolved by help on SO ( question ). Now I got everything working and deployed it, however on the server I get the following error:

Exceeded soft private memory limit with 158.164 MB after servicing 0 requests total

Now I tried all I know to reduce the memory usage and am currently using the code below. The GZipped file is about 7 MB whereas unzipped it is 80 MB. Locally the code is working fine. I tried running it as HTTP request as well as Cron Job but it didn't make a difference. Now I am wondering if there is any way to make it more efficient.

Some similar questions on SO referred to Frontend and Backend specification, which I am not familiar with. I am running the free version of GAE and this task would have to run once a week. Any suggestions on best way to move forward would be very much appreciated.

from google.appengine.api.urlfetch import fetch
import gzip, base64, StringIO, datetime, webapp2
from lxml import etree
from google.appengine.ext import db

class GetProductCatalog(webapp2.RequestHandler):
  def get(self):
    user = XXX
    password = YYY
    url = 'URL'

    # fetch gziped file
    catalogResponse = fetch(url, headers={
        "Authorization": "Basic %s" % base64.b64encode(user + ':' + password)
    }, deadline=10000000)

    # the response content is in catalogResponse.content
    # un gzip the file
    f = StringIO.StringIO(catalogResponse.content)
    c = gzip.GzipFile(fileobj=f)
    content = c.read()

    # create something readable by lxml
    xml = StringIO.StringIO(content)

    # delete unnecesary variables
    del f
    del c
    del content

    # parse the file
    tree = etree.iterparse(xml, tag='product')

    for event, element in tree:
        if element.findtext('manufacturer') == 'New York':
            if Product.get_by_key_name(element.findtext('sku')):
                    coupon = Product.get_by_key_name(element.findtext('sku'))
                    if coupon.last_update_prov != datetime.datetime.strptime(element.findtext('lastupdated'), "%d/%m/%Y"):
                        coupon.restaurant_name = element.findtext('name')
                        coupon.restaurant_id = ''
                        coupon.address_street = element.findtext('keywords').split(',')[0]
                        coupon.address_city = element.findtext('manufacturer')
                        coupon.address_state = element.findtext('publisher')
                        coupon.address_zip = element.findtext('manufacturerid')
                        coupon.value = '$' + element.findtext('price') + ' for $' + element.findtext('retailprice')
                        coupon.restrictions = element.findtext('warranty')
                        coupon.url = element.findtext('buyurl')
                        if element.findtext('instock') == 'YES':
                            coupon.active = True
                        else:
                            coupon.active = False
                        coupon.last_update_prov = datetime.datetime.strptime(element.findtext('lastupdated'), "%d/%m/%Y")
                        coupon.put()
                    else:
                        pass
            else:
                    coupon = Product(key_name = element.findtext('sku'))
                    coupon.restaurant_name = element.findtext('name')
                    coupon.restaurant_id = ''
                    coupon.address_street = element.findtext('keywords').split(',')[0]
                    coupon.address_city = element.findtext('manufacturer')
                    coupon.address_state = element.findtext('publisher')
                    coupon.address_zip = element.findtext('manufacturerid')
                    coupon.value = '$' + element.findtext('price') + ' for $' + element.findtext('retailprice')
                    coupon.restrictions = element.findtext('warranty')
                    coupon.url = element.findtext('buyurl')
                    if element.findtext('instock') == 'YES':
                        coupon.active = True
                    else:
                        coupon.active = False

                    coupon.last_update_prov = datetime.datetime.strptime(element.findtext('lastupdated'), "%d/%m/%Y")
                    coupon.put()
        else:
            pass

        element.clear()

UDPATE

According to Paul's suggestion I implemented the backend. After some troubles it worked like a charm - find the code I used below.

My backends.yaml looks as follows:

backends:
- name: mybackend
  instances: 10
  start: mybackend.app
  options: dynamic

And my app.yaml as follows:

handlers:
- url: /update/mybackend
  script: mybackend.app
  login: admin

Backends are like front end instances but they don't scale and you have to stop and start them as you need them (or set them to be dynamic, probably your best bet here).

You can have up to 1024MB of memory in the backend so it will probably work fine for your task.

https://developers.google.com/appengine/docs/python/backends/overview

App Engine Backends are instances of your application that are exempt from request deadlines and have access to more memory (up to 1GB) and CPU (up to 4.8GHz) than normal instances. They are designed for applications that need faster performance, large amounts of addressable memory, and continuous or long-running background processes. Backends come in several sizes and configurations, and are billed for uptime rather than CPU usage.

A backend may be configured as either resident or dynamic. Resident backends run continuously, allowing you to rely on the state of their memory over time and perform complex initialization. Dynamic backends come into existence when they receive a request, and are turned down when idle; they are ideal for work that is intermittent or driven by user activity. For more information about the differences between resident and dynamic backends, see Types of Backends and also the discussion of Startup and Shutdown.

It sounds like just what you need. The free usage level will also be OK for your task.

Regarding the backend: looking at the example you have provided - seems like your request is simply handled by frontend instance.

To make it be handled by the backend, try instead calling the task like: http://mybackend.my_app_app_id.appspot.com/update/mybackend

Also, I think you can remove: start: mybackend.app from your backends.yaml

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM