简体   繁体   English

两个标签之间的Python HTML解析

[英]Python HTML Parsing Between two tags

Today I was looking into a small file uploader and I got the following response from the API page. 今天,我正在研究一个小型文件上传器,并且从API页面得到了以下响应。

upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html

I need to get the part between the two <br> tags. 我需要得到两个<br>标记之间的部分。 I am using Beautifulsoup and this code but it returns None. 我正在使用Beautifulsoup和此代码,但它返回None。

fpbs = BeautifulSoup(filepup.text)
finallink = fpbs.find('br', 'br')
print(finallink)

You cannot search for text between two tags, no. 您不能在两个标签之间搜索文本,否。 You can locate the first <br> tag, then take its next sibling , however: 可以找到第一个<br>标签,然后获取下一个同级标签,但是:

>>> soup = BeautifulSoup('upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html')
>>> soup.find('br')
<br/>
>>> soup.find('br').next_sibling
u'http://www.filepup.net/files/R6wVq1405781467.html'

You could use a CSS selector search to search for an adjacent sibling , then grab the preceding sibling; 可以使用CSS选择器搜索来搜索相邻的兄弟姐妹 ,然后抓住前面的兄弟姐妹; to CSS only the tags are siblings, but to BeautifulSoup the text nodes count too. 对于CSS,只有标签是兄弟姐妹,对于BeautifulSoup,文本节点也要计数。

The adjacent select is + between two CSS selectors, and selects the second of the two; 相邻的select是两个CSS选择器之间的+ ,并选择两个中的第二个; br + br would select any br tag that comes second. br + br将选择第二个br标签。

Together with a parent node (say a specific id or class) that can be a very powerful combination: 与父节点(例如特定的ID或类)一起可以构成非常强大的组合:

>>> soup = BeautifulSoup('''\
... <div id="div1">
...     some text
...     <br/>
...     some target text
...     <br/>
...     foo bar
... </div>
... <div id="div2">
...     some more text
...     <br/>
...     select me, ooh, pick me!
...     <br/>
...     fooed the bar!
... </div>
... ''')
>>> soup.select('#div2 br + br')[0]
<br/>
>>> soup.select('#div2 br + br')[0].previous_sibling
u'\n    select me, ooh, pick me!\n    '

This picked a very specific text node between two <br> tags, in a specific <div> tag. 这在特定的<div>标记中的两个<br>标记之间选择了一个非常特定的文本节点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM