简体   繁体   中英

Scrapy ModuleNotFoundError: No module named 'MySQLdb'

Just started out with Scrapy and I am trying to write to a MySQL database rather than outputting to a csv.

I have found the code here: https://gist.github.com/tzermias/6982723 that I am using to try to make this work, but unfortunately having an error that I can't get my head around.

This is my pipelines.py:

    class WebsitePipeline(object):
    def process_item(self, item, spider):
        return item

import MySQLdb.cursors
from twisted.enterprise import adbapi

from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.utils.project import get_project_settings
from scrapy import log

SETTINGS = get_project_settings()

class MySQLPipeline(object):

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.stats)

    def __init__(self, stats):
        #Instantiate DB
        self.dbpool = adbapi.ConnectionPool ('MySQLdb',
            host=SETTINGS['DB_HOST'],
            user=SETTINGS['DB_USER'],
            passwd=SETTINGS['DB_PASSWD'],
            port=SETTINGS['DB_PORT'],
            db=SETTINGS['DB_DB'],
            charset='utf8',
            use_unicode = True,
            cursorclass=MySQLdb.cursors.DictCursor
        )
        self.stats = stats
        dispatcher.connect(self.spider_closed, signals.spider_closed)
    def spider_closed(self, spider):
        """ Cleanup function, called after crawing has finished to close open
            objects.
            Close ConnectionPool. """
        self.dbpool.close()

    def process_item(self, item, spider):
        query = self.dbpool.runInteraction(self._insert_record, item)
        query.addErrback(self._handle_error)
        return item

def _insert_record(self, tx, item):
        result = tx.execute(
        """ INSERT INTO table VALUES (1,2,3)""" 
        )
        if result > 0:
            self.stats.inc_value('database/items_added')

def _handle_error(self, e):
    log.err(e)

This is what is in my settings.py:

# Configure item pipelines
# See https://doc.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
    'Website.pipelines.MySQLPipeline': 300,
}

#Database settings
DB_HOST = 'localhost'
DB_PORT = 3306
DB_USER = 'username'
DB_PASSWD = 'password'
DB_DB = 'scrape'

This is the spider.py:

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import SitemapSpider

class WebsitesitemapSpider(SitemapSpider):
    name = 'Websitesitemap'
    allowed_domains = ['Website.com']
    sitemap_urls = ['https://www.Website.com/robots.txt']

    def parse(self, response):
        yield {response.url}

I have been unable to find a working example of what I am looking to do to be able to work out where I am going wrong so thank you to anyone who looks at this or might be able to help.

do you have these packages installed "MySQLdb, scrapy, twisted".

Else try installing using PIP and then try running the script.

you will need MySQL-python installed in your python environment, along with libmysql installed on the operating system.

On Ubuntu this would be achieved in the folllowing manner.

pip install MySQL-python sudo apt-get install libmysql-dev

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM