简体   繁体   English

使用Python从网页中选择特定文本

[英]Selecting specific text from a webpage using Python

Although I love the program, I've gotten extremely tired of Calibre's weekly updating habit. 尽管我喜欢这个程序,但是我对Calibre每周更新的习惯感到非常厌倦。 To counteract that problem I'm trying to work with a python script that will automate the process. 为了解决该问题,我正在尝试使用可自动执行该过程的python脚本。

I have successfully opened the document, but I have trouble figuring out how to capture a specific piece of it for a string. 我已经成功打开了文档,但是在弄清楚如何为字符串捕获特定片段方面遇到麻烦。 Since Calibre's download link depends on the version number that needs to be retrieved. 由于Calibre的下载链接取决于需要检索的版本号。 Currently line 218 contains the following: 当前,第218行包含以下内容:

  <a href="/projects/calibre/files/latest/download?source=files" title="/0.8.34/calibre-portable-0.8.34.zip: released on 2012-01-06 07:22:08 UTC"> 

I need to retrieve "calibre-ebook.0.8.34" from the line. 我需要从该行中检索“ calibre-ebook.0.8.34”。 Any suggestions on how to make that work? 关于如何进行这项工作有什么建议吗?

import urllib2
print("Calibre is Updating")
url = urllib2.urlopen ( "http://sourceforge.net/projects/calibre/files" ).read()
print(url)

An amendment to your code: 您的代码的修正:

import urllib2
import re

print("Calibre is Updating")
url = urllib2.urlopen ( "http://sourceforge.net/projects/calibre/files" ).read()

result = re.search('title="/[0-9.]*/([a-zA-Z\-]*-[0-9\.]*)', url).groups()[0][:-1]
print(result)

What I'm doing here is using the re module to search for a string that matches your request and saving it to result. 我在这里使用的是re模块,以搜索与您的请求匹配的字符串并将其保存为结果。

I end up stripping the last character since my regex saves an extra dot. 我最后删除了最后一个字符,因为我的正则表达式节省了一个额外的点。 With some patience you can really nail it down to only what you need. 有了一些耐心,您就可以真正将其固定在所需的东西上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 BeautifulSoup 从 Python 网页中抓取特定文本? - How to scrape specific text from a webpage in Python using BeautifulSoup? 如何使用Beautifulsoup-python从div中特定标题的段落元素中的网页元素中提取网页文本 - How to pull text from webpage from paragraph element in specific header inside a div using Beautifulsoup-python 使用Python中的Selenium从网页中提取文本 - Extract text from webpage using Selenium in Python 如何使用 Python 中的 Selenium 在特定区域内的网页中搜索文本 - How to search for text in a webpage within a specific area using Selenium in Python 从文本文件或网页中选择Unicode字符 - selecting unicode characters from text file or webpage 在 Python 中使用 XPATH 和 Selenium 选择网页上的所有可见文本将所有文本作为一个 WebElement 返回 - Selecting all visible text on a webpage using XPATH and Selenium in Python returns all text as one WebElement Python:从记事本复制文本,然后粘贴到网页的特定框中? - Python: Copy text from notepad, then paste into a specific box in a webpage? 从网页中提取特定文本 - Extract specific text from webpage 使用 Python 从网页接收特定 URL(Multireddit 列表) - Using Python to receive specific URLs from a webpage (Multireddit lists) 使用python将文本打印到网页 - printing text to webpage using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM