简体   繁体   English

如何使用python在两个特定字符串之间的网页中查找特定字符串?

[英]How to find a specific string in a web page which is between two particular strings using python?

Say I am scraping www.website.com.假设我正在抓取 www.website.com。 Using these two lines of codes,使用这两行代码,

page = requests.get(www.website.com)
tree = html.fromstring(page.content)

I have stored the whole source into tree .我已将整个源代码存储到tree Now, tree is obviously full of texts and tags and html stuff.现在, tree显然充满了文本、标签和 html 的东西。 I am only interested in one particular string which is between two other strings, say start and end , and does NOT include one specific word .我只对介于其他两个字符串之间的一个特定字符串感兴趣,比如说startend ,并且不包括一个特定的word How can I do that?我怎样才能做到这一点?

Without knowing the specific format of the website you're scraping, the only way i can think to do it is to do a depth-first concatenation of the content of the html elements in the tree.在不知道您正在抓取的网站的具体格式的情况下,我能想到的唯一方法是对树中 html 元素的内容进行深度优先连接。 Then search that concatenation for "start", record that index, search for "end", record that index, and then take a substring between the two indices.然后在该串联中搜索“start”,记录该索引,搜索“end”,记录该索引,然后在两个索引之间取一个子字符串。

How about something like this:这样的事情怎么样:

>>> tree = "This is the start and end"
>>> tree.split('start')[-1].split('end')[0]
' and '

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM