[英]Python HTML Parsing Between two tags
Today I was looking into a small file uploader and I got the following response from the API page. 今天,我正在研究一个小型文件上传器,并且从API页面得到了以下响应。
upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html
I need to get the part between the two <br>
tags. 我需要得到两个<br>
标记之间的部分。 I am using Beautifulsoup and this code but it returns None. 我正在使用Beautifulsoup和此代码,但它返回None。
fpbs = BeautifulSoup(filepup.text)
finallink = fpbs.find('br', 'br')
print(finallink)
You cannot search for text between two tags, no. 您不能在两个标签之间搜索文本,否。 You can locate the first <br>
tag, then take its next sibling , however: 您可以找到第一个<br>
标签,然后获取下一个同级标签,但是:
>>> soup = BeautifulSoup('upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html')
>>> soup.find('br')
<br/>
>>> soup.find('br').next_sibling
u'http://www.filepup.net/files/R6wVq1405781467.html'
You could use a CSS selector search to search for an adjacent sibling , then grab the preceding sibling; 您可以使用CSS选择器搜索来搜索相邻的兄弟姐妹 ,然后抓住前面的兄弟姐妹; to CSS only the tags are siblings, but to BeautifulSoup the text nodes count too. 对于CSS,只有标签是兄弟姐妹,对于BeautifulSoup,文本节点也要计数。
The adjacent select is +
between two CSS selectors, and selects the second of the two; 相邻的select是两个CSS选择器之间的+
,并选择两个中的第二个; br + br
would select any br
tag that comes second. br + br
将选择第二个br
标签。
Together with a parent node (say a specific id or class) that can be a very powerful combination: 与父节点(例如特定的ID或类)一起可以构成非常强大的组合:
>>> soup = BeautifulSoup('''\
... <div id="div1">
... some text
... <br/>
... some target text
... <br/>
... foo bar
... </div>
... <div id="div2">
... some more text
... <br/>
... select me, ooh, pick me!
... <br/>
... fooed the bar!
... </div>
... ''')
>>> soup.select('#div2 br + br')[0]
<br/>
>>> soup.select('#div2 br + br')[0].previous_sibling
u'\n select me, ooh, pick me!\n '
This picked a very specific text node between two <br>
tags, in a specific <div>
tag. 这在特定的<div>
标记中的两个<br>
标记之间选择了一个非常特定的文本节点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.