[英]Pattern web unable to locate elements by class names
I'm trying to identify DOM elements by class name, but I'm not able to use the pattern.web as described in the docs (I'm also running code that I've used before, so it did work at some point). 我正在尝试通过类名来标识DOM元素,但是我无法使用docs中所述的pattern.web(我也在运行我以前使用的代码,因此它确实在某些时候起作用了)。
from pattern.web import DOM
html = """<html><head><title>pattern.web | CLiPS</title></head>
<body>
<div class="class1 class2 class3">
<form action="/pages/pattern-web" accept-charset="UTF-8" method="post" id="search-block-form">
<div>
<label for="edit-search-block-form-1">Search this site: </label>
</div>
</form>
</div>
</body></html>"""
dom = DOM(html)
print "Search Results by Method:"
print 'tag[attr="value"] Notation Results:'
print dom('div[class="class1 class2 class3"]')
print
print 'tag.class Notation Results:'
print dom('div.class1')
print
print 'By class, no tag results:'
print dom.by_class('class1')
print
print 'Looping through all divs and printing matching results:'
for i in dom('div'):
if 'class' in i.attrs and i.attrs['class'] == 'class1 class2 class3':
print i.attrs
Note that ( Element
and DOM
functions are interchangeable and give the same results). 注意(
Element
和DOM
函数是可互换的,并且给出相同的结果)。 The result is this: 结果是这样的:
Search Results by Method:
tag[attr="value"] Notation Results:
[]
tag.class Notation Results:
[]
By class, no tag results:
[Element(tag='div')]
Looping through all divs and printing matching results:
{u'class': u'class1 class2 class3'}
As you can see, looking it up using the tag.class
notation and the tag[attr="value"]
notation both give empty results, but by_class
returns one result. 如您所见,使用
tag.class
表示法和tag[attr="value"]
表示法by_class
结果都是空的,但是by_class
返回一个结果。 Clearly elements with those attributes exist. 显然,具有那些属性的元素存在。 How do I search for all the divs that have all 3 classes?
如何搜索具有所有3个类的所有div?
In the past, I've been able to search using dom('div.class1.class2.class3')
to identify a div with all 3 classes. 过去,我已经能够使用
dom('div.class1.class2.class3')
进行搜索,以识别所有3个类的div。 Not only does this not work, it's also giving me unicode errors (it appears that the second period causes a unicode error) : TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'
这不仅不起作用,而且还给我unicode错误(似乎第二个句点导致unicode错误):
TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'
Question : In the past, I've been able to search using
dom('div.class1.class2.class3')
to identify a div with all 3 classes.问题 :过去,我已经能够使用
dom('div.class1.class2.class3')
进行搜索,以识别所有3个类的div。
Reading the Source github.com/clips/pattern/blob/master/pattern/web ,
阅读源代码github.com/clips/pattern/blob/master/pattern/web ,
found, it's only a wrapper usingBeautiful Soup
.发现,这只是使用
Beautiful Soup
的包装。# Beautiful Soup is wrapped in DOM, Element and Text classes, resembling the Javascript DOM.
#Beautiful Soup被包装在DOM,Element和Text类中,类似于Javascript DOM。
# Beautiful Soup can also be used directly#美丽汤也可以直接使用
It's a known Issue, see SO: Beautiful soup find_all doesn't find CSS selector with multiple classes
这是一个已知问题,请参见SO: 漂亮的汤find_all找不到具有多个类的CSS选择器
The workaround ist to use .select(...)
instead of .find_all(...)
, 解决方法是使用
.select(...)
而不是.find_all(...)
,
didn't find .select(...)
in pattern.web
在
pattern.web
找不到.select(...)
For example: 例如:
from bs4 import BeautifulSoup
html = """<html><head><title>pattern.web | CLiPS</title></head>
<body>
<div class="class1 class4">
<form action="/pages/pattern-web" accept-charset="UTF-8" method="post" id="search-block-form">
<div class="class1 class2 class3">
<label for="edit-search-block-form-1">Search this site: </label>
</div>
</form>
</div>
</body></html>
"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.select('div.class1.class2')
print("{}".format(div))
Output :
输出 :
[<div class="class1 class2 class3"> <label for="edit-search-block-form-1">Search this site: </label> </div>]
Question : it's also giving me unicode errors (it appears that the second period causes a unicode error) :
问题 :它也给我unicode错误(似乎第二个句点导致unicode错误):
TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'
It's unknown, if this TypeError
is from pattern.web
or Beautiful Soup
. 未知的是,此
TypeError
是来自pattern.web
还是Beautiful Soup
。
According to this SO: descriptor-join-requires-a-unicode-object-but-received-a-str it's a standard Python message. 根据这个SO: descriptor-join-requires-unicode-object-but-received-a-str,它是标准的Python消息。
Using pattern.web
from GitHub, the results are as expected: 使用来自GitHub的
pattern.web
,结果符合预期:
from pattern.web import Element
elements = Element(html)
print("Search Results by Method:")
print('tag[attr="value"] Notation\tResults:{}'
.format(elements('div[class="class1 class2 class3"]')))
print('tag.class Notation \t\t\tResults:{}'
.format(elements('div.class1.class2.class3')))
print('By class, no tag \t\t\tResults:{}'
.format(elements.by_class('class1 class2 class3')))
print('Looping through all divs and printing matching results:')
for i in elements('div'):
if 'class' in i.attrs:
if " ".join(i.attrs['class']) == 'class1 class2 class3':
print("\tmatch:{}".format(i.attrs))
Output :
输出 :
Search Results by Method: tag[attr="value"] Notation Results:{'class': ['class1', 'class2', 'class3']} tag.class Notation Results:{'class': ['class1', 'class2', 'class3']} By class, no tag Results:{'class': ['class1', 'class2', 'class3']} Looping through all divs and printing matching results: match:{'class': ['class1', 'class2', 'class3']}
Tested with Python:3.5.3 - pattern.web:3.6 - bs4:4.5.3 使用Python:3.5.3-pattern.web:3.6-bs4:4.5.3测试
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.