繁体 English 中英

更改python中的目录并通过scrapy spider提取.html文件名

[英]Change directory in python and extract .html filenames through scrapy spider

原文 2012-01-19 06:23:23 4 1 python/ scrapy

我写了一个蜘蛛，它爬过名为fid的文件夹，并提取所有子文件夹的名称作为链接。 现在的问题是，这些子文件夹中的每个子文件夹中都有一个html页面，我想提取所有这些html文件的名称并添加到当前的“ start_urls”中，以便我可以从所有这些html中抓取所需信息页面。 我努力了：

os.listdir()
glob.glob()

但是这些都不起作用。 请帮我解决一下这个。

1 个解决方案

一种stdlib方法是将os.walk与fnmatch结合使用：

import fnmatch
import os

start_urls = []

for root, dirnames, filenames in os.walk('/start/dir/'):
    for filename in fnmatch.filter(filenames, '*.html'):
        start_urls.append(os.path.join(root, filename))

Python沙皮蜘蛛

[英]Python scrapy spider

Scrapy问题蜘蛛Python

[英]Scrapy issue spider python

如何通过Python Scrapy蜘蛛解析嵌入式链接

[英]How to parse embedded links through Python Scrapy spider

根据通过python脚本传递给Spider的URL列表运行scrapy spider

[英]Run scrapy spider based on list of URLs passed to the spider through python script

来自Spider的scrapy python call spider

[英]scrapy python call spider from spider

Scrapy蜘蛛不保存html文件

[英]Scrapy spider not saving html files

如何使用Python更改目录中的多个文件名

[英]How to change multiple filenames in a directory using Python

Python scrapy spider找不到KeyError

[英]Python scrapy spider not found KeyError

Scrapy：Python无法找到蜘蛛

[英]Scrapy: Python cannot find the spider

在Scrapy中使用递归蜘蛛[Python]

[英]Using recursion Spider in Scrapy [Python]

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python沙皮蜘蛛 Scrapy问题蜘蛛Python 如何通过Python Scrapy蜘蛛解析嵌入式链接根据通过python脚本传递给Spider的URL列表运行scrapy spider 来自Spider的scrapy python call spider Scrapy蜘蛛不保存html文件如何使用Python更改目录中的多个文件名 Python scrapy spider找不到KeyError Scrapy：Python无法找到蜘蛛在Scrapy中使用递归蜘蛛[Python]

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM