從python3中的div中獲取特定文本

Question

這是我試圖從中提取的 html 示例：

    <div class="small subtle link">                      
                    <a href="https://example.com" target=&quot;_blank&quot;  nofollow >Example</a>
                

                
                     This text!
            </div>

我想抓住“這個文本！” 但是當我這樣做時，我不斷得到“示例”

                myText=soup.findAll('div',{'class':re.compile('small subtle link')})
        if myText: 
            extractedText=myText.text.strip()

如何省略 a 標簽中的文本？

Answer 1

有幾種可能的解決方案，這一切都取決於您正在尋找的確切行為。

這會產生正確的輸出：

from bs4 import BeautifulSoup

html_src = \
    '''
    <html>
    <body>
    <div class="small subtle link">
        <a href="https://example.com" nofollow="" target='"_blank"'>
            Example
        </a>
        This text!
    </div>
    </body>
    </html>
    '''

soup = BeautifulSoup(html_src, 'lxml')
print(soup.prettify())

div_tag = soup.find(name='div', attrs={'class': 'small subtle link'})

div_content_text = []
for curr_text in div_tag.find_all(recursive=False, text=True):
    curr_text = curr_text.strip()
    if curr_text:
        div_content_text.append(curr_text)

print(div_content_text)

編輯： Sushil的解決方案也很干凈。

Answer 2

這是你需要的：

soup.div.find(text=True, recursive=False)

Answer 3

你可以試試這個：

print(div.a.find_next_sibling(text=True).strip())

這會在div下找到a標簽並打印它后面的文本。

這是完整的代碼：

from bs4 import BeautifulSoup

html = """
<div class="small subtle link">                      
                    <a href="https://example.com" target=&quot;_blank&quot;  nofollow >Example</a>
                

                
                     This text!
            </div>
"""

soup = BeautifulSoup(html,'html5lib')

div = soup.find('div', class_ = "small subtle link")

print(div.a.find_next_sibling(text=True).strip())

輸出：

This text!

從python3中的div中獲取特定文本

問題描述

3 個解決方案

解決方案1
1 已采納 2020-11-04 01:09:15

解決方案2
0 2020-11-03 23:56:31

解決方案3
0 2020-11-04 01:26:28

從python3中的div中獲取特定文本

問題描述

3 個解決方案

解決方案1 1 已采納 2020-11-04 01:09:15

解決方案2 0 2020-11-03 23:56:31

解決方案3 0 2020-11-04 01:26:28

解決方案1
1 已采納 2020-11-04 01:09:15

解決方案2
0 2020-11-03 23:56:31

解決方案3
0 2020-11-04 01:26:28