简体   繁体   English

Python pickle实例没有属性

[英]Python pickle instance has no attribute

I'm trying to crawl a website. 我正在尝试抓取网站。 In this website i store the crawled persons in person_set, and the queue for crawling next persons in parse_queue. 在此网站中,我将已爬网的人员存储在person_set中,并将用于爬网下一个人员的队列存储在parse_queue中。 At the starting of the each person crawl, i need to write these two data structures into a file in case crawling interrupts due to exceptions or bad connection so i can continue later. 在开始每个人爬网时,我需要将这两个数据结构写入文件中,以防由于异常或连接不良而导致爬网中断,因此我可以稍后继续。

I have three python files. 我有三个python文件。 A main file, a spider, and a person model. 主文件,蜘蛛和人物模型。 Main instantiates spider, spider starts parsing and calls write and read when necessary. Main实例化Spider,Spider开始解析,并在必要时调用write和read。 person file has the class Person which is the model for storing person data. 人员文件具有“人员”类,该类是用于存储人员数据的模型。

I'm having problems with reading the data i wrote. 我在读取我写的数据时遇到问题。 I checked many questions about this error and it seems like an import problem. 我检查了许多有关此错误的问题,这似乎是导入问题。 But even though i imported Person class into main and spiders it's still giving me error. 但是,即使我将Person类导入到main和spider中,它仍然给我错误。 It seems like emergency_read method is not affected by my top level import. 看来Emergency_read方法不受我的顶级导入的影响。

main.py main.py

from spiders import Spider
from person import Person
import pickle

def main():
    ....
    spider = Spider("seed_input")
    spider.parse(client)

spiders.py spiders.py

import pickle
from person import Person

class Spider:

    def __init__(self, filename):
            self.person_set = Set()

            self.file_to_seed(filename)
            for seed_url in self.seed_list:
                    self.seed_to_parse_queue(seed_url)

    def parse(self, client):
        if os.path.exists('tmp.person_set'):
                print "Program wasnt ended properly, continuing from where it left"
                self.emergency_read()

        ... starts parsing

    def emergency_write(self):

        if os.path.exists('tmp.person_set'):
            self.delete_emergency_files()

        with open('tmp.person_set', 'wb') as f:
            pickle.dump(self.person_set, f)

        with open('tmp.parse_queue', 'wb') as f:
            pickle.dump(self.parse_queue, f)

    def emergency_read(self):
        with open('tmp.person_set', 'rb') as f:
            self.person_set = pickle.load(f)

        with open('tmp.parse_queue', 'rb') as f:
            self.parse_queue = pickle.load(f)

person.py person.py

class Person:

    def __init__(self, name):
            self.name = name
            self.friend_set = Set()
            self.profile_url = ""
            self.id = 0
            self.color = "Grey"
            self.parent = None
            self.depth = 0

    def add_friend(self, friend):
            self.friend_set.add(friend)

    def __repr__(self):
            return "Person(%s, %s)" % (self.profile_url, self.name)

    def __eq__(self, other):
            if isinstance(other, Person):
                return ((self.profile_url == other.profile_url) and (self.name == other.name))
            else:
                return False

    def __ne__(self, other):
            return (not self.__eq__(other))

    def __hash__(self):
            return hash(self.__repr__())

Stacktrace 堆栈跟踪

python main.py 
Program wasnt ended properly, continuing from where it left
Traceback (most recent call last):
File "main.py", line 47, in <module>
main()
File "main.py", line 34, in main
spider.parse(client)
File "/home/ynscn/py-workspace/lll/spiders.py", line 39, in parse
self.emergency_read()
File "/home/ynscn/py-workspace/lll/spiders.py", line 262, in emergency_read
self.person_set = pickle.load(f)
File "/usr/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1198, in load_setitem
dict[key] = value
File "/home/ynscn/py-workspace/lll/person.py", line 30, in __hash__
return hash(self.__repr__())
File "/home/ynscn/py-workspace/lll/person.py", line 18, in __repr__
return "Person(%s, %s)" % (self.profile_url, self.name)
AttributeError: Person instance has no attribute 'profile_url'

Pickle loads the components of a class instance in a non-deterministic order. Pickle以不确定的顺序加载类实例的组件。 This error is happening during the load but before it has deserialized the Person.profile_url attribute. 在加载期间但在反序列化Person.profile_url属性之前,会发生此错误。 Notice that it fails during load_setitem , which means it is probably trying to load the friend_set attribute, which is a set. 请注意,它在load_setitem期间load_setitem ,这意味着它可能正在尝试加载friend_set属性(该属性是一个集合)。

Your custom __repr__() relies on a class attribute, and then your custom __hash__() (which is needed by pickle ) relies on __repr__() . 您的自定义__repr__()依赖于class属性,然后您的自定义__hash__()pickle需要)依赖于__repr__()

My recommendation is to use Python's default __hash__ method. 我的建议是使用Python的默认__hash__方法。 Would that work? 那行得通吗?

Your code might serialize as is if you use dill instead of pickle . 如果使用dill而不是pickle您的代码可能会按原样进行序列化。 dill can pickle class objects, instances, methods, and attributes… and most everything in python. dill可以腌制类对象,实例,方法和属性……以及python中的大多数内容。 dill can also store dynamically modified state for classes and class instances. dill还可以为类和类实例存储动态修改的状态。 I agree that it seems to be a pickle load error, as @nofinator points out. 我同意@nofinator指出,这似乎是一个泡菜load错误。 However, dill might let to get around it. 但是, dill可能会绕开它。

Probably even better might be that if you want to force an order for load and unload, you could try adding __getstate__ and __setstate__ methods. 可能更好的是,如果您想强制执行装载和卸载的命令,则可以尝试添加__getstate____setstate__方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM