如何使用美湯找到沒有兄弟姐妹的P標簽

Question

某些<p></p>標簽具有<img>標簽和<h4>標簽，但是我只希望那些<p>標簽中沒有同級標簽而只是內容。

 <p> <img src="any url"/> </p>     <p> hello world </p>

我想要使用美麗湯的沒有<img>標簽的<p>標簽

Answer 1

這將獲取<p>元素中的所有文本，但不會從<p>中的任何子元素中獲取文本。 遞歸需要等於false，否則它將調查子元素。 我添加了另一個測試用例供您顯示： <p><h4>Heading</h4></p>

from bs4 import BeautifulSoup

html = "<p> <img src='any url'/> </p>   <p><h4>Heading</h4></p>  <p> hello world </p>"

soup = BeautifulSoup(html)

for element in soup.findAll('p'):
    print("".join(element.findAll(text=True, recursive=False)))

Answer 2

一種獲取沒有子標簽的所有p標簽的解決方案。

import bs4
html="""<p> <img src="any url"/> </p>     <p> hello world </p>"""
soup=bs4.BeautifulSoup(html,"html.parser")

def has_no_tag_children(tag):
    if  type(tag) is bs4.element.Tag: #check if tag
        if tag.name =='p': #check if it is p tag
            if  bs4.element.Tag not in [type(child) for child in tag.children]: # check if has any tag children
                return True
    return False

kids=soup.find_all(has_no_tag_children)
print(kids)

輸出量

[<p> hello world </p>]

Answer 3

假設BeautifulSoup 4.7+，您應該可以執行以下操作：

import bs4
html="""<p> <img src="any url"/> </p>     <p> hello world </p>"""
soup=bs4.BeautifulSoup(html,"html.parser")

kids=soup.select("p:not(:has(*))")
print(kids)

Answer 4

from bs4 import BeautifulSoup

txt = """
<p> <img src="any url"/> </p>     <p> hello world </p>
"""

soup = BeautifulSoup(txt)

for node in soup.findAll('p'):
    print(' '.join(node.findAll(text=True, recursive = False)))

輸出：

你好，世界

如何使用美湯找到沒有兄弟姐妹的P標簽

問題描述

4 個解決方案

解決方案1
0 2019-01-28 08:48:20

解決方案2
0 2019-01-29 19:50:39

解決方案3
0 2019-01-29 20:12:23

解決方案4
-1 2019-01-28 08:32:41

如何使用美湯找到沒有兄弟姐妹的P標簽

問題描述

4 個解決方案

解決方案1 0 2019-01-28 08:48:20

解決方案2 0 2019-01-29 19:50:39

解決方案3 0 2019-01-29 20:12:23

解決方案4 -1 2019-01-28 08:32:41

解決方案1
0 2019-01-28 08:48:20

解決方案2
0 2019-01-29 19:50:39

解決方案3
0 2019-01-29 20:12:23

解決方案4
-1 2019-01-28 08:32:41