如何使我的scrapy读取同一目录中的文件？

Question

The target file urls.txt contains all the url to be downloaded. 目标文件urls.txt包含所有要下载的URL。

├─spiders
│  │  stockInfo.py
│  │  urls.txt
│  │  __init__.py

stockInfo.py is my scrapy file. stockInfo.py是我的scrapy文件。

import scrapy
import os
import re

class QuotesSpider(scrapy.Spider):
    name = "stockInfo"
    projectFile = r"d:/toturial/toturial/spiders/urls.txt"
    with open(projectFile,"r") as f:
        urls = f.readlines()
    start_urls = [url.strip() for url in urls]

    def parse(self, response):
        pass

I have tested that the above stockInfo.py can run successfully in my local pc end with command: 我已经测试过上面的stockInfo.py可以在我的本地pc端使用命令成功运行：

scrapy crawl  stockInfo

Now i deploy the project into remote end scrapy hub with 现在我将项目部署到远端scrapy hub与

pip install shub
shub login
API key: xxxxxxxxxxxxxxxxx
shub deploy 380020

It run into trouble: 它遇到了麻烦：

IOError: [Errno 2] No such file or directory: 'd:/toturial/toturial/spiders/urls.txt'

How to fix it when to deploy my scrapy into the hub ? 如何解决它何时将我的scrapy部署到hub ？ It is useful to rewrite 重写很有用

projectFile = r"d:/toturial/toturial/spiders/urls.txt" projectFile = r“ d：/toturial/toturial/spiders/urls.txt”

as 如

projectFile = "./urls.txt" projectFile =“ ./urls.txt”

when to run it in my local pc end. 何时在我的本地PC端运行它。

Strangely, it is no use to rewrite 奇怪的是，它没有用

projectFile = r"d:/toturial/toturial/spiders/urls.txt" projectFile = r“ d：/toturial/toturial/spiders/urls.txt”

as 如

projectFile = "./urls.txt" projectFile =“ ./urls.txt”

when to run it in remote end scrapy hub . 什么时候在远端的scrapy hub运行它。

Answer 1

1.add new directory and move urls.txt in it. 1.添加新目录并在其中移动urls.txt 。
To add a new directory resources ,and save urls.txt in it. 添加新目录resources ，并在其中保存urls.txt 。
My new directory tree is as below. 我的新目录树如下所示。

tutorial
├─tutorial
│  ├─resources
|     |--urls.txt
│  ├─spiders
|     |--stockInfo.py

2.rewrite the setup.py as below. 2.如下所示重写setup.py。

from setuptools import setup, find_packages

setup(
    name='tutorial',
    version='1.0',
    packages=find_packages(),
    package_data={
        'tutorial': ['resources/*.txt']
    },
    entry_points={
        'scrapy': ['settings = tutorial.settings']
    },
    zip_safe=False,
)

3.rewrite stockInfo.py as below. 3.如下所示重写stockInfo.py 。

import scrapy
import os 
import re
import pkgutil
class QuotesSpider(scrapy.Spider):
    name = "stockInfo"
    data = pkgutil.get_data("tutorial", "resources/urls.txt")
    data = data.decode()
    start_urls = data.split("\r\n")

    def parse(self, response):
        pass

如何使我的scrapy读取同一目录中的文件？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-03-16 12:24:40

如何使我的scrapy读取同一目录中的文件？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-03-16 12:24:40

解决方案1
0 已采纳 2019-03-16 12:24:40