繁体 English 中英

如何在不写入光盘的情况下阅读和解析html文件

[英]how to read and parse html file without writing on disc

原文 2013-12-18 05:16:50 1 1 python

最近我编写了一个python脚本来解析网页中的特定行。 这段代码运行正常，但每当我运行它时，它会在工作目录上下载并写入一个文件“.php”：

#!/usr/bin/env python
import wget
import re
from HTMLParser import HTMLParser
import tempfile
url = "http://tuberculist.epfl.ch/quicksearch.php?gene+name=0009&submit=Search#sequence"
filname = wget.download(url)
a = open(filname,'r')
b = a.readlines()
f = "|Rv0009|"
for c in b:
    if f in c:
        pattern = re.compile("> >.+<br /></")
        z = pattern.findall(c)
        print z

我应该做出哪些更改，以便在不编写文件的情况下解析所需的行。

1 个解决方案

几点说明：

urllib.urlopen(url)将为您提供类似文件的对象，而不是在磁盘上写任何内容。
您的代码正在导入它未使用的2个模块（ HTMLParser和tempfile ）。 摆脱那些进口。
您的URL的#sequence部分永远不会提供给服务器（它是HTTP规范的一部分）。 你可以把它拿出来。
您正在使用正则表达式来解析HTML。 随着您的使用案例的复杂化，它将导致您痛苦和痛苦。 请考虑使用lxml.html（ http://lxml.de/lxmlhtml.html ）或BeautifulSoup（ http://www.crummy.com/software/BeautifulSoup/ ）。

如何优化从RAM到光盘的写入？

[英]How to optimize my writing from RAM to disc?

Python 如何在不写入目录的情况下读取 xlsx 文件并转换为 csv

[英]Python how to read xlsx file and convert into csv without writing to directory

获取文件的md5而不将其保存在光盘上

[英]get md5 of file without saving it on disc

如何在不写入文件的情况下从 dataframe 获取 HTML？

[英]How to get the HTML from dataframe without writing to a file?

如何在不下载文件的情况下在view.html中阅读pdf

[英]how to read a pdf in view.html without downloading the file

如何使用 python 无限制地读取 HTML 文件？

[英]How to read HTML file without any limit using python?

Python 解析 html 文件而不丢失依赖项

[英]Python parse html file without losing dependancies

解析一个没有漂亮汤的html文件

[英]Parse a html file without Beautiful soup

如何使用python从文本文件读取而不阻止其他程序对其进行写入

[英]how to read from a text file with python without blocking other programs from writing to it

如何在Python中没有模式的情况下读取和解析XML？

[英]How to read and parse XML without schema in Python?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何优化从RAM到光盘的写入？ Python 如何在不写入目录的情况下读取 xlsx 文件并转换为 csv 获取文件的md5而不将其保存在光盘上如何在不写入文件的情况下从 dataframe 获取 HTML？如何在不下载文件的情况下在view.html中阅读pdf 如何使用 python 无限制地读取 HTML 文件？ Python 解析 html 文件而不丢失依赖项解析一个没有漂亮汤的html文件如何使用python从文本文件读取而不阻止其他程序对其进行写入如何在Python中没有模式的情况下读取和解析XML？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM