[英]Beautiful Soup 4 CSS sibling selector
I'm trying to parse some HTML exported from an InDesign document with Beatiful Soup 4 abd Python 2.7. 我正在尝试使用Beatiful Soup 4 abd Python 2.7解析从InDesign文档导出的一些HTML。 I am trying to find a specific tag by using a CSS sibling selector.
我试图通过使用CSS兄弟选择器找到一个特定的标签。 I am able to access the tag I want by selecting its sibling via a CSS selector and then using the Beautiful Soup
find_next_sibling()
method, but I can't select it directly via a CSS selector. 我可以通过CSS选择器选择它的兄弟,然后使用Beautiful Soup
find_next_sibling()
方法来访问我想要的标签,但我无法通过CSS选择器直接选择它。
I have verified that the selector itself is valid when I try it in pure CSS/JS ( http://jsfiddle.net/Sj63x/1/ ). 我已经验证了当我在纯CSS / JS( http://jsfiddle.net/Sj63x/1/ )中尝试时,选择器本身是有效的。 I have tried using all three parsers recommended on the Beautiful Soup home page as well.
我也试过使用Beautiful Soup主页上推荐的所有三种解析器。
Relevant code is posted below (text is in the JS fiddle): 相关代码发布在下面(文本在JS小提琴中):
text = BeautifulSoup(text)
'''this finds the sibling'''
sibling = text.select(".Book-Title-")
print(sibling[0].string)
'''this finds the sibling I am looking for'''
targetText = sibling[0].find_next_sibling()
print(targetText.string)
'''This should find the same text but returns an empty list'''
targetText2 = text.select(".Book-Title- ~.Text")
print(targetText2)
'''Other attempted variations - also return empty lists'''
targetText3 = text.select(".Book-Title- ~ .Text")
targetText4 = text.select(".Book-Title- + .Text")
Try using this selector instead: 请尝试使用此选择器:
targetText2 = text.select(".Book-Title- + .Text")
or add a space between the tilde character and the sibling: 或者在波形符和兄弟之间添加一个空格:
targetText2 = text.select(".Book-Title- ~ .Text")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.