简体   繁体   中英

Python pickle instance has no attribute

I'm trying to crawl a website. In this website i store the crawled persons in person_set, and the queue for crawling next persons in parse_queue. At the starting of the each person crawl, i need to write these two data structures into a file in case crawling interrupts due to exceptions or bad connection so i can continue later.

I have three python files. A main file, a spider, and a person model. Main instantiates spider, spider starts parsing and calls write and read when necessary. person file has the class Person which is the model for storing person data.

I'm having problems with reading the data i wrote. I checked many questions about this error and it seems like an import problem. But even though i imported Person class into main and spiders it's still giving me error. It seems like emergency_read method is not affected by my top level import.

main.py

from spiders import Spider
from person import Person
import pickle

def main():
    ....
    spider = Spider("seed_input")
    spider.parse(client)

spiders.py

import pickle
from person import Person

class Spider:

    def __init__(self, filename):
            self.person_set = Set()

            self.file_to_seed(filename)
            for seed_url in self.seed_list:
                    self.seed_to_parse_queue(seed_url)

    def parse(self, client):
        if os.path.exists('tmp.person_set'):
                print "Program wasnt ended properly, continuing from where it left"
                self.emergency_read()

        ... starts parsing

    def emergency_write(self):

        if os.path.exists('tmp.person_set'):
            self.delete_emergency_files()

        with open('tmp.person_set', 'wb') as f:
            pickle.dump(self.person_set, f)

        with open('tmp.parse_queue', 'wb') as f:
            pickle.dump(self.parse_queue, f)

    def emergency_read(self):
        with open('tmp.person_set', 'rb') as f:
            self.person_set = pickle.load(f)

        with open('tmp.parse_queue', 'rb') as f:
            self.parse_queue = pickle.load(f)

person.py

class Person:

    def __init__(self, name):
            self.name = name
            self.friend_set = Set()
            self.profile_url = ""
            self.id = 0
            self.color = "Grey"
            self.parent = None
            self.depth = 0

    def add_friend(self, friend):
            self.friend_set.add(friend)

    def __repr__(self):
            return "Person(%s, %s)" % (self.profile_url, self.name)

    def __eq__(self, other):
            if isinstance(other, Person):
                return ((self.profile_url == other.profile_url) and (self.name == other.name))
            else:
                return False

    def __ne__(self, other):
            return (not self.__eq__(other))

    def __hash__(self):
            return hash(self.__repr__())

Stacktrace

python main.py 
Program wasnt ended properly, continuing from where it left
Traceback (most recent call last):
File "main.py", line 47, in <module>
main()
File "main.py", line 34, in main
spider.parse(client)
File "/home/ynscn/py-workspace/lll/spiders.py", line 39, in parse
self.emergency_read()
File "/home/ynscn/py-workspace/lll/spiders.py", line 262, in emergency_read
self.person_set = pickle.load(f)
File "/usr/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1198, in load_setitem
dict[key] = value
File "/home/ynscn/py-workspace/lll/person.py", line 30, in __hash__
return hash(self.__repr__())
File "/home/ynscn/py-workspace/lll/person.py", line 18, in __repr__
return "Person(%s, %s)" % (self.profile_url, self.name)
AttributeError: Person instance has no attribute 'profile_url'

Pickle loads the components of a class instance in a non-deterministic order. This error is happening during the load but before it has deserialized the Person.profile_url attribute. Notice that it fails during load_setitem , which means it is probably trying to load the friend_set attribute, which is a set.

Your custom __repr__() relies on a class attribute, and then your custom __hash__() (which is needed by pickle ) relies on __repr__() .

My recommendation is to use Python's default __hash__ method. Would that work?

Your code might serialize as is if you use dill instead of pickle . dill can pickle class objects, instances, methods, and attributes… and most everything in python. dill can also store dynamically modified state for classes and class instances. I agree that it seems to be a pickle load error, as @nofinator points out. However, dill might let to get around it.

Probably even better might be that if you want to force an order for load and unload, you could try adding __getstate__ and __setstate__ methods.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM