為什么BeautifulSoup4中標簽的兄弟姐妹可以是字符串？

Question

乍一看，我認為.next_sibling和previous_sibling應該是同級標簽是很自然的。 但是當我今天玩它時，它導致NavigableString像"\\n" 。

在仔細檢查其文檔之后，它指出：

In real documents, the .next_sibling or .previous_sibling of a tag will usually be a string containing whitespace. Going back to the “three sisters” document:

<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
You might think that the .next_sibling of the first <a> tag would be the second <a> tag. But actually, it’s a string: the comma and newline that separate the first <a> tag from the second:

link = soup.a
link
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

link.next_sibling
# u',\n'
The second <a> tag is actually the .next_sibling of the comma:

link.next_sibling.next_sibling
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>

這是為什么？

Answer 1

.find_next_sibling屬性用於對HTML文檔進行細粒度搜索。 CSS選擇器無法執行的操作（它們可以選擇標簽，而不能選擇標簽之間的字符串，例如，您不能使用CSS選擇器選擇字符串SELECT THIS ： <p>some text</p>SELECT THIS<p>some text</p> ）。

如果要搜索同級標簽，請使用find_next_sibling()方法。 您還可以通過將text=True參數傳遞給find_next_sibling()來模擬.find_next_sibling行為：

data = '''
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>'''


from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

link = soup.a
print(link)                                     # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
print(type(link.next_sibling))                  # <class 'bs4.element.NavigableString'>
print(link.find_next_sibling())                 # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
print(type(link.find_next_sibling(text=True)))  # <class 'bs4.element.NavigableString'>

Answer 2

文檔頁面16“”“”“”

希望我回答了你的問題。

為什么BeautifulSoup4中標簽的兄弟姐妹可以是字符串？

問題描述

2 個解決方案

解決方案1
1 2019-07-20 13:14:30

解決方案2
0 2019-07-20 12:25:47

為什么BeautifulSoup4中標簽的兄弟姐妹可以是字符串？

問題描述

2 個解決方案

解決方案1 1 2019-07-20 13:14:30

解決方案2 0 2019-07-20 12:25:47

解決方案1
1 2019-07-20 13:14:30

解決方案2
0 2019-07-20 12:25:47