![](/img/trans.png)
[英]Python Selenium How to get anchor tag href value only if anchor tag contains certain attribute value
[英]How to get text and href value in anchor tag with scrapy, xpath, python
我有一個像這樣的 HTML 文件:
<div ckass="jokes-nav">
<ul>
<li><a href="http://link_1">Link 1</a></li>
<li><a href="http://link_2">Link 2</a></li>
</ul>
</div>
在文件夾 spiders 中,我有一個文件jacks.py ,如下所示:
import scrapy
from demo_project.items import JokeItem
from scrapy.loader import ItemLoader
class JokesSpider(scrapy.Spider):
name = 'jokes'
start_urls = [
'http://www.laughfactory.com/jokes/'
]
def parse(self, response):
for joke in response.xpath("//div[@class='jokes-nav']/ul"):
l = ItemLoader(item = JokeItem(), selector = joke)
l.add_xpath('joke_title', ".//li/a/text()")
""" yield {
'joke_text': joke.xpath(".//div[@class='joke-text']/p").extract_first()
} """
yield l.load_item()
我在我的main.py 中調用 class JokesSpider (這個文件在根目錄),這是我的代碼
from scrapy.crawler import CrawlerProcess
from demo_project.spiders.jokes import JokesSpider
process = CrawlerProcess(settings={
"FEEDS": {
"items.json": {"format": "json"},
},
})
process.crawl(JokesSpider)
process.start() # the script will block here until the crawling is finished
我想將數據寫入 items.json,但是當我運行此代碼時,items.json 中不包含任何內容,我該如何解決這個問題。 非常感謝
您可以設置FEED_FORMAT
和FEED_URI
設置以將數據保存在 json 文件中。
process = CrawlerProcess(settings={
'FEED_FORMAT': 'json',
'FEED_URI': 'items.json'
})
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.