I am trying to spin and connect two containers (mongo and scrapy spider) using docker-compose. Being new to Docker I've had a hard time troubleshooting networking ports (inside and outside the container). To respect your time I'll keep it short.
The problem:
Can't connect the spider to the mongo db container and get a timeout error. I think it has to with the IP address that I am trying to connect to from the container is incorrect. However, the spider works locally (non-dockerized version) and can pass data to a running mongo container.
small edit to remove name and email from code.
error:
pymongo.errors.ServerSelectionTimeoutError: 127.0.0.1:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 5feb8bdcf912ec8797c25497, topology_type: Single
pipeline code:
from scrapy.exceptions import DropItem
# scrappy log is deprecated
#from scrapy.utils import log
import logging
import scrapy
from itemadapter import ItemAdapter
import pymongo
class xkcdMongoDBStorage:
"""
Class that handles the connection of
Input:
MongoDB
Output
"""
def __init__(self):
# requires two arguments(address and port)
#* connecting to the db
self.conn = pymongo.MongoClient(
'127.0.0.1',27017) # works with spider local and container running
# '0.0.0.0',27017)
# connecting to the db
dbnames = self.conn.list_database_names()
if 'randallMunroe' not in dbnames:
# creating the database
self.db = self.conn['randallMunroe']
#if database already exists we want access
else:
self.db = self.conn.randallMunroe
#* connecting to the table
dbCollectionNames = self.db.list_collection_names()
if 'webComic' not in dbCollectionNames:
self.collection = self.db['webComic']
else:
# the table already exist so we access it
self.collection = self.db.webComic
def process_item(self, item, spider):
valid = True
for data in item:
if not data:
valid = False
raise DropItem("Missing {0}!".format(data))
if valid:
self.collection.insert(dict(item))
logging.info(f"Question added to MongoDB database!")
return item
Dockerfile for the spider
# base image
FROM python:3
# metadata info
LABEL maintainer="first last name" email="something@gmail.com"
# exposing container port to be the same as scrapy default
EXPOSE 6023
# set work directly so that paths can be relative
WORKDIR /usr/src/app
# copy to make usage of caching
COPY requirements.txt ./
#install dependencies
RUN pip3 install --no-cache-dir -r requirements.txt
# copy code itself from local file to image
COPY . .
CMD scrapy crawl xkcdDocker
version: '3'
services:
db:
image: mongo:latest
container_name: NoSQLDB
restart: always
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: password
volumes:
- ./data/bin:/data/db
ports:
- 27017:27017
expose:
- 27017
xkcd-scraper:
build: ./scraperDocker
container_name: xkcd-scraper-container
volumes:
- ./scraperDocker:/usr/src/app/scraper
ports:
- 5000:6023
expose:
- 6023
depends_on:
- db
Thanks for the help
Try:
self.conn = pymongo.MongoClient('NoSQLDB',27017)
Within docker compose you reference other containers based on the service name.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.