简体   繁体   English

模式Web无法通过类名称找到元素

[英]Pattern web unable to locate elements by class names

I'm trying to identify DOM elements by class name, but I'm not able to use the pattern.web as described in the docs (I'm also running code that I've used before, so it did work at some point). 我正在尝试通过类名来标识DOM元素,但是我无法使用docs中所述的pattern.web(我也在运行我以前使用的代码,因此它确实在某些时候起作用了)。

from pattern.web import DOM

html = """<html><head><title>pattern.web | CLiPS</title></head>
<body>
  <div class="class1 class2 class3">
    <form action="/pages/pattern-web"  accept-charset="UTF-8" method="post" id="search-block-form">
      <div>
        <label for="edit-search-block-form-1">Search this site: </label>
      </div>
    </form>
  </div>
</body></html>"""

dom = DOM(html)
print "Search Results by Method:"
print 'tag[attr="value"] Notation Results:'
print dom('div[class="class1 class2 class3"]')
print 
print 'tag.class Notation Results:'
print dom('div.class1')
print
print 'By class, no tag results:'
print dom.by_class('class1')
print 
print 'Looping through all divs and printing matching results:'
for i in dom('div'):
    if 'class' in i.attrs and i.attrs['class'] == 'class1 class2 class3':
        print i.attrs

Note that ( Element and DOM functions are interchangeable and give the same results). 注意( ElementDOM函数是可互换的,并且给出相同的结果)。 The result is this: 结果是这样的:

Search Results by Method:
tag[attr="value"] Notation Results:
[]

tag.class Notation Results:
[]

By class, no tag results:
[Element(tag='div')]

Looping through all divs and printing matching results:
{u'class': u'class1 class2 class3'}

As you can see, looking it up using the tag.class notation and the tag[attr="value"] notation both give empty results, but by_class returns one result. 如您所见,使用tag.class表示法和tag[attr="value"]表示法by_class结果都是空的,但是by_class返回一个结果。 Clearly elements with those attributes exist. 显然,具有那些属性的元素存在。 How do I search for all the divs that have all 3 classes? 如何搜索具有所有3个类的所有div?

In the past, I've been able to search using dom('div.class1.class2.class3') to identify a div with all 3 classes. 过去,我已经能够使用dom('div.class1.class2.class3')进行搜索,以识别所有3个类的div。 Not only does this not work, it's also giving me unicode errors (it appears that the second period causes a unicode error) : TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode' 这不仅不起作用,而且还给我unicode错误(似乎第二个句点导致unicode错误): TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'

Question : In the past, I've been able to search using dom('div.class1.class2.class3') to identify a div with all 3 classes. 问题 :过去,我已经能够使用dom('div.class1.class2.class3')进行搜索,以识别所有3个类的div。


Reading the Source github.com/clips/pattern/blob/master/pattern/web , 阅读源代码github.com/clips/pattern/blob/master/pattern/web
found, it's only a wrapper using Beautiful Soup . 发现,这只是使用Beautiful Soup的包装。

# Beautiful Soup is wrapped in DOM, Element and Text classes, resembling the Javascript DOM. #Beautiful Soup被包装在DOM,Element和Text类中,类似于Javascript DOM。
# Beautiful Soup can also be used directly #美丽汤也可以直接使用


It's a known Issue, see SO: Beautiful soup find_all doesn't find CSS selector with multiple classes 这是一个已知问题,请参见SO: 漂亮的汤find_all找不到具有多个类的CSS选择器

The workaround ist to use .select(...) instead of .find_all(...) , 解决方法是使用.select(...)而不是.find_all(...)
didn't find .select(...) in pattern.web pattern.web找不到.select(...)

For example: 例如:

from bs4 import BeautifulSoup

html = """<html><head><title>pattern.web | CLiPS</title></head>
  <body>
    <div class="class1 class4">
      <form action="/pages/pattern-web"  accept-charset="UTF-8" method="post" id="search-block-form">
        <div class="class1 class2 class3">
          <label for="edit-search-block-form-1">Search this site: </label>
        </div>
      </form>
    </div>
</body></html>
"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.select('div.class1.class2')
print("{}".format(div))

Output : 输出

 [<div class="class1 class2 class3"> <label for="edit-search-block-form-1">Search this site: </label> </div>] 

Question : it's also giving me unicode errors (it appears that the second period causes a unicode error) : 问题 :它也给我unicode错误(似乎第二个句点导致unicode错误):

 TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode' 

It's unknown, if this TypeError is from pattern.web or Beautiful Soup . 未知的是,此TypeError是来自pattern.web还是Beautiful Soup
According to this SO: descriptor-join-requires-a-unicode-object-but-received-a-str it's a standard Python message. 根据这个SO: descriptor-join-requires-unicode-object-but-received-a-str,它是标准的Python消息。


Using pattern.web from GitHub, the results are as expected: 使用来自GitHub的pattern.web ,结果符合预期:

from pattern.web import Element

elements = Element(html)
print("Search Results by Method:")
print('tag[attr="value"] Notation\tResults:{}'
    .format(elements('div[class="class1 class2 class3"]')))

print('tag.class Notation \t\t\tResults:{}'
    .format(elements('div.class1.class2.class3')))

print('By class, no tag \t\t\tResults:{}'
    .format(elements.by_class('class1 class2 class3')))

print('Looping through all divs and printing matching results:')
for i in elements('div'):
    if 'class' in i.attrs:
        if " ".join(i.attrs['class']) == 'class1 class2 class3':
            print("\tmatch:{}".format(i.attrs))

Output : 输出

 Search Results by Method: tag[attr="value"] Notation Results:{'class': ['class1', 'class2', 'class3']} tag.class Notation Results:{'class': ['class1', 'class2', 'class3']} By class, no tag Results:{'class': ['class1', 'class2', 'class3']} Looping through all divs and printing matching results: match:{'class': ['class1', 'class2', 'class3']} 

Tested with Python:3.5.3 - pattern.web:3.6 - bs4:4.5.3 使用Python:3.5.3-pattern.web:3.6-bs4:4.5.3测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM