简体   繁体   English

如何按特定顺序存储scrapy中抓取的数据?

[英]How to store data crawled in scrapy in specfic order?

I have to crawl data from a web page in a specfic order as liked i declared fields in my item class and then have to put them in csv file.problem now occuring is there its stores data not in specfic order as like its scrapping data of any field and putting in csv file but i want it should store data as i declared in my item class. 我必须按照特定顺序从网页上爬取数据,就像我在我的item类中声明字段一样,然后将它们放入csv文件中。现在出现的问题是它的存储数据不是特定顺序的,就像它的报废数据一样任何字段并放入csv文件,但我希望它应存储我在我的项目类中声明的数据。 I am newbie in python. 我是python的新手。 can you tell me how to do this 你能告诉我怎么做吗

For ex: my item class is class DmozItem(Item): title = Field() link = Field() desc = Field() 例如:我的物品类是DmozItem(Item)类:title = Field()link = Field()desc = Field()

Now when its storing data in csv file its storing first desc ,link and then title "desc": [], "link": ["/Computers/Programming/"], "title": ["Programming"]} 现在,当其将数据存储在csv文件中时,其存储的是第一个desc,链接,然后是标题“ desc”:[],“链接”:[“ / Computers / Programming /”],“标题”:[“ Programming”]}

The reason that the order of data in csv file is not what you declared is because item is a dict data type. CSV文件中的数据顺序不是您声明的顺序的原因是因为item是dict数据类型。 The order of keys in dict is decided by their alphabet order. dict中的键顺序由其字母顺序决定。 The logic of export items to csv file is implemented in 将项目导出到csv文件的逻辑在

scrapy\\contrib\\exporter__init__.py scrapy \\的contrib \\ exporter__init __。PY

You can rewrite the _get_serialized_fields method of BaseItemExporter to let it yield key-value pair in the order of your declaration. 您可以重写BaseItemExporter的_get_serialized_fields方法,以使其按照声明的顺序产生键值对。 Here is an example code 这是一个示例代码

field_iter = ['title', 'link', 'desc']
for field_name in field_iter:
    if field_name in item:
        field = item.fields[field_name]
        value = self.serialize_field(field, field_name, item[field_name])
    else:
        value = default_value
    yield field_name, value

But do remember, it is not an universal solution. 但是请记住,这不是通用解决方案。

为此,我们必须在BaseItemExporter类field_iter = ['Offer','SKU','VendorCode']中的fields_to_export中创建一个列表,然后在字段中传递此列表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM