[英]Error When Debugging Django project with ImageFields in Visual Studio Code
[英]Debugging Scrapy Project in Visual Studio Code
我在 Windows 機器上有 Visual Studio Code,我正在制作一個新的 Scrapy Crawler。 爬蟲工作正常,但我想調試代碼,為此我將其添加到我的launch.json
文件中:
{
"name": "Scrapy with Integrated Terminal/Console",
"type": "python",
"request": "launch",
"stopOnEntry": true,
"pythonPath": "${config:python.pythonPath}",
"program": "C:/Users/neo/.virtualenvs/Gers-Crawler-77pVkqzP/Scripts/scrapy.exe",
"cwd": "${workspaceRoot}",
"args": [
"crawl",
"amazon",
"-o",
"amazon.json"
],
"console": "integratedTerminal",
"env": {},
"envFile": "${workspaceRoot}/.env",
"debugOptions": [
"RedirectOutput"
]
}
但我無法達到任何斷點。 PS:我從這里獲取了 JSON 腳本: http : //www.stevetrefethen.com/blog/debugging-a-python-scrapy-project-in-vscode
為了執行典型的scrapy runspider <PYTHON_FILE>
命令,必須將以下配置設置到您的launch.json
:
{
"version": "0.1.0",
"configurations": [
{
"name": "Python: Launch Scrapy Spider",
"type": "python",
"request": "launch",
"module": "scrapy",
"args": [
"runspider",
"${file}"
],
"console": "integratedTerminal"
}
]
}
在任何地方設置斷點,然后調試。
在您的 scrapy 項目文件夾中,使用以下內容創建一個runner.py
模塊:
import os from scrapy.cmdline import execute os.chdir(os.path.dirname(os.path.realpath(__file__))) try: execute( [ 'scrapy', 'crawl', 'SPIDER NAME', '-o', 'out.json', ] ) except SystemExit: pass
在要調試的行中放置一個斷點
使用 vscode 調試器運行runner.py
像這樣配置你的json
文件:
"version": "0.2.0",
"configurations": [
{
"name": "Crawl with scrapy",
"type": "python",
"request": "launch",
"module": "scrapy",
"cwd": "${fileDirname}",
"args": [
"crawl",
"<SPIDER NAME>"
],
"console": "internalConsole"
}
]
單擊 VSCode 中與您的蜘蛛對應的選項卡,然后啟動與json
文件對應的調試會話。
我做到了。 最簡單的方法是制作一個運行腳本runner.py
import scrapy
from scrapy.crawler import CrawlerProcess
from g4gscraper.spiders.g4gcrawler import G4GSpider
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'json',
'FEED_URI': 'data.json'
})
process.crawl(G4GSpider)
process.start() # the script will block here until the crawling is finished
然后,當我在這個文件上啟動調試器時,我在蜘蛛中添加了斷點。 參考: https : //doc.scrapy.org/en/latest/topics/practices.html
不需要修改launch.json,默認的“Python: Current File (Integrated Terminal)”完美運行。 對於 Python3 項目,記得將 runner.py 文件與scrapy.cfg文件(即項目根目錄)放在同一級別。
@naqushab 上面的 runner.py 代碼。 請注意processes.crawl( className ) ,其中className是您要設置斷點的蜘蛛類。
你也可以試試
{
"configurations": [
{
"name": "Python: Scrapy",
"type": "python",
"request": "launch",
"module": "scrapy",
"cwd": "${fileDirname}",
"args": [
"crawl",
"${fileBasenameNoExtension}",
"--loglevel=ERROR"
],
"console": "integratedTerminal",
"justMyCode": false
}
]
}
但該字段的名稱應與蜘蛛名稱相同。
--loglevel=ERROR 是為了獲得更簡潔的輸出;)
我應用了@fmango 的代碼並對其進行了改進。
無需編寫單獨的運行程序文件,只需將這些代碼行粘貼到蜘蛛的末尾即可。
運行 python 調試器。 僅此而已
if __name__ == '__main__':
import os
from scrapy.cmdline import execute
os.chdir(os.path.dirname(os.path.realpath(__file__)))
SPIDER_NAME = MySpider.name
try:
execute(
[
'scrapy',
'crawl',
SPIDER_NAME,
'-s',
'FEED_EXPORT_ENCODING=utf-8',
]
)
except SystemExit:
pass
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.