[英]How to make my scrapy read the file which in the same directory?
The target file urls.txt
contains all the url to be downloaded. 目标文件
urls.txt
包含所有要下载的URL。
├─spiders
│ │ stockInfo.py
│ │ urls.txt
│ │ __init__.py
stockInfo.py
is my scrapy file. stockInfo.py
是我的scrapy文件。
import scrapy
import os
import re
class QuotesSpider(scrapy.Spider):
name = "stockInfo"
projectFile = r"d:/toturial/toturial/spiders/urls.txt"
with open(projectFile,"r") as f:
urls = f.readlines()
start_urls = [url.strip() for url in urls]
def parse(self, response):
pass
I have tested that the above stockInfo.py
can run successfully in my local pc end with command: 我已经测试过上面的
stockInfo.py
可以在我的本地pc端使用命令成功运行:
scrapy crawl stockInfo
Now i deploy the project into remote end scrapy hub
with 现在我将项目部署到远端
scrapy hub
与
pip install shub
shub login
API key: xxxxxxxxxxxxxxxxx
shub deploy 380020
It run into trouble: 它遇到了麻烦:
IOError: [Errno 2] No such file or directory: 'd:/toturial/toturial/spiders/urls.txt'
How to fix it when to deploy my scrapy
into the hub
? 如何解决它何时将我的
scrapy
部署到hub
? It is useful to rewrite 重写很有用
projectFile = r"d:/toturial/toturial/spiders/urls.txt"
projectFile = r“ d:/toturial/toturial/spiders/urls.txt”
as 如
projectFile = "./urls.txt"
projectFile =“ ./urls.txt”
when to run it in my local pc end. 何时在我的本地PC端运行它。
Strangely, it is no use to rewrite 奇怪的是,它没有用
projectFile = r"d:/toturial/toturial/spiders/urls.txt"
projectFile = r“ d:/toturial/toturial/spiders/urls.txt”
as 如
projectFile = "./urls.txt"
projectFile =“ ./urls.txt”
when to run it in remote end scrapy hub
. 什么时候在远端的
scrapy hub
运行它。
1.add new directory and move urls.txt
in it. 1.添加新目录并在其中移动
urls.txt
。
To add a new directory resources
,and save urls.txt
in it. 添加新目录
resources
,并在其中保存urls.txt
。
My new directory tree is as below. 我的新目录树如下所示。
tutorial
├─tutorial
│ ├─resources
| |--urls.txt
│ ├─spiders
| |--stockInfo.py
2.rewrite the setup.py as below. 2.如下所示重写setup.py。
from setuptools import setup, find_packages
setup(
name='tutorial',
version='1.0',
packages=find_packages(),
package_data={
'tutorial': ['resources/*.txt']
},
entry_points={
'scrapy': ['settings = tutorial.settings']
},
zip_safe=False,
)
3.rewrite stockInfo.py
as below. 3.如下所示重写
stockInfo.py
。
import scrapy
import os
import re
import pkgutil
class QuotesSpider(scrapy.Spider):
name = "stockInfo"
data = pkgutil.get_data("tutorial", "resources/urls.txt")
data = data.decode()
start_urls = data.split("\r\n")
def parse(self, response):
pass
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.