簡體 English 中英

更改python中的目錄並通過scrapy spider提取.html文件名

[英]Change directory in python and extract .html filenames through scrapy spider

原文 2012-01-19 06:23:23 6 1 python/ scrapy

我寫了一個蜘蛛，它爬過名為fid的文件夾，並提取所有子文件夾的名稱作為鏈接。 現在的問題是，這些子文件夾中的每個子文件夾中都有一個html頁面，我想提取所有這些html文件的名稱並添加到當前的“ start_urls”中，以便我可以從所有這些html中抓取所需信息頁面。 我努力了：

os.listdir()
glob.glob()

但是這些都不起作用。 請幫我解決一下這個。

1 個解決方案

一種stdlib方法是將os.walk與fnmatch結合使用：

import fnmatch
import os

start_urls = []

for root, dirnames, filenames in os.walk('/start/dir/'):
    for filename in fnmatch.filter(filenames, '*.html'):
        start_urls.append(os.path.join(root, filename))

Python沙皮蜘蛛

[英]Python scrapy spider

Scrapy問題蜘蛛Python

[英]Scrapy issue spider python

如何通過Python Scrapy蜘蛛解析嵌入式鏈接

[英]How to parse embedded links through Python Scrapy spider

根據通過python腳本傳遞給Spider的URL列表運行scrapy spider

[英]Run scrapy spider based on list of URLs passed to the spider through python script

來自Spider的scrapy python call spider

[英]scrapy python call spider from spider

Scrapy蜘蛛不保存html文件

[英]Scrapy spider not saving html files

如何使用Python更改目錄中的多個文件名

[英]How to change multiple filenames in a directory using Python

Python scrapy spider找不到KeyError

[英]Python scrapy spider not found KeyError

Scrapy：Python無法找到蜘蛛

[英]Scrapy: Python cannot find the spider

在Scrapy中使用遞歸蜘蛛[Python]

[英]Using recursion Spider in Scrapy [Python]

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Python沙皮蜘蛛 Scrapy問題蜘蛛Python 如何通過Python Scrapy蜘蛛解析嵌入式鏈接根據通過python腳本傳遞給Spider的URL列表運行scrapy spider 來自Spider的scrapy python call spider Scrapy蜘蛛不保存html文件如何使用Python更改目錄中的多個文件名 Python scrapy spider找不到KeyError Scrapy：Python無法找到蜘蛛在Scrapy中使用遞歸蜘蛛[Python]

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM