简体   繁体   English

如何使我的scrapy读取同一目录中的文件?

[英]How to make my scrapy read the file which in the same directory?

The target file urls.txt contains all the url to be downloaded. 目标文件urls.txt包含所有要下载的URL。

├─spiders
│  │  stockInfo.py
│  │  urls.txt
│  │  __init__.py

stockInfo.py is my scrapy file. stockInfo.py是我的scrapy文件。

import scrapy
import os
import re

class QuotesSpider(scrapy.Spider):
    name = "stockInfo"
    projectFile = r"d:/toturial/toturial/spiders/urls.txt"
    with open(projectFile,"r") as f:
        urls = f.readlines()
    start_urls = [url.strip() for url in urls]

    def parse(self, response):
        pass

I have tested that the above stockInfo.py can run successfully in my local pc end with command: 我已经测试过上面的stockInfo.py可以在我的本地pc端使用命令成功运行:

scrapy crawl  stockInfo

Now i deploy the project into remote end scrapy hub with 现在我将项目部署到远端scrapy hub

pip install shub
shub login
API key: xxxxxxxxxxxxxxxxx
shub deploy 380020

It run into trouble: 它遇到了麻烦:

IOError: [Errno 2] No such file or directory: 'd:/toturial/toturial/spiders/urls.txt'

How to fix it when to deploy my scrapy into the hub ? 如何解决它何时将我的scrapy部署到hub It is useful to rewrite 重写很有用

projectFile = r"d:/toturial/toturial/spiders/urls.txt" projectFile = r“ d:/toturial/toturial/spiders/urls.txt”

as

projectFile = "./urls.txt" projectFile =“ ./urls.txt”

when to run it in my local pc end. 何时在我的本地PC端运行它。

Strangely, it is no use to rewrite 奇怪的是,它没有用

projectFile = r"d:/toturial/toturial/spiders/urls.txt" projectFile = r“ d:/toturial/toturial/spiders/urls.txt”

as

projectFile = "./urls.txt" projectFile =“ ./urls.txt”

when to run it in remote end scrapy hub . 什么时候在远端的scrapy hub运行它。

1.add new directory and move urls.txt in it. 1.添加新目录并在其中移动urls.txt
To add a new directory resources ,and save urls.txt in it. 添加新目录resources ,并在其中保存urls.txt
My new directory tree is as below. 我的新目录树如下所示。

tutorial
├─tutorial
│  ├─resources
|     |--urls.txt
│  ├─spiders
|     |--stockInfo.py

2.rewrite the setup.py as below. 2.如下所示重写setup.py。

from setuptools import setup, find_packages

setup(
    name='tutorial',
    version='1.0',
    packages=find_packages(),
    package_data={
        'tutorial': ['resources/*.txt']
    },
    entry_points={
        'scrapy': ['settings = tutorial.settings']
    },
    zip_safe=False,
)

3.rewrite stockInfo.py as below. 3.如下所示重写stockInfo.py

import scrapy
import os 
import re
import pkgutil
class QuotesSpider(scrapy.Spider):
    name = "stockInfo"
    data = pkgutil.get_data("tutorial", "resources/urls.txt")
    data = data.decode()
    start_urls = data.split("\r\n")

    def parse(self, response):
        pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何读取其他目录中的文件? - How to read the file which is in other directory? 读取文件并创建目录python - Read file and make a directory python 如何导入一个模块,该模块在与模块相同的目录中打开文件? - How to import a module which opens a file in the same directory as the module? scrapy如何制作我自己的调度程序middelware - scrapy how to make my own scheduler middelware 如何在scrapy中读取json文件中的行 - How to read lines from a json file in scrapy 我试图读取在我的 Python 程序中以文本形式写入的日志文件,但它返回“没有这样的文件或目录” - Im trying to read a log file which is written in text in my Python program but its returning an “No such file or directory” 如何获取文件路径目录并使用它来读取我的Excel文件? (苹果电脑) - How to get filepath directory and use it to read my excel file? (Mac) 我正在尝试将模块导入到我的main.py python文件中,这两个文件都在同一目录中 - Im trying to import a module into my main.py python file both of which are in the same directory 如何使scrapy输出到stdout以从Python读取 - How to make scrapy output to stdout to be read from Python 我试图在同一个目录中创建一个目录,我的文件所在的位置但它在“venv”中创建 - I am trying to make a directory in the same one, where my file is but it makes it in "venv"
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM