简体   繁体   中英

Pattern web unable to locate elements by class names

I'm trying to identify DOM elements by class name, but I'm not able to use the pattern.web as described in the docs (I'm also running code that I've used before, so it did work at some point).

from pattern.web import DOM

html = """<html><head><title>pattern.web | CLiPS</title></head>
<body>
  <div class="class1 class2 class3">
    <form action="/pages/pattern-web"  accept-charset="UTF-8" method="post" id="search-block-form">
      <div>
        <label for="edit-search-block-form-1">Search this site: </label>
      </div>
    </form>
  </div>
</body></html>"""

dom = DOM(html)
print "Search Results by Method:"
print 'tag[attr="value"] Notation Results:'
print dom('div[class="class1 class2 class3"]')
print 
print 'tag.class Notation Results:'
print dom('div.class1')
print
print 'By class, no tag results:'
print dom.by_class('class1')
print 
print 'Looping through all divs and printing matching results:'
for i in dom('div'):
    if 'class' in i.attrs and i.attrs['class'] == 'class1 class2 class3':
        print i.attrs

Note that ( Element and DOM functions are interchangeable and give the same results). The result is this:

Search Results by Method:
tag[attr="value"] Notation Results:
[]

tag.class Notation Results:
[]

By class, no tag results:
[Element(tag='div')]

Looping through all divs and printing matching results:
{u'class': u'class1 class2 class3'}

As you can see, looking it up using the tag.class notation and the tag[attr="value"] notation both give empty results, but by_class returns one result. Clearly elements with those attributes exist. How do I search for all the divs that have all 3 classes?

In the past, I've been able to search using dom('div.class1.class2.class3') to identify a div with all 3 classes. Not only does this not work, it's also giving me unicode errors (it appears that the second period causes a unicode error) : TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode'

Question : In the past, I've been able to search using dom('div.class1.class2.class3') to identify a div with all 3 classes.


Reading the Source github.com/clips/pattern/blob/master/pattern/web ,
found, it's only a wrapper using Beautiful Soup .

# Beautiful Soup is wrapped in DOM, Element and Text classes, resembling the Javascript DOM.
# Beautiful Soup can also be used directly


It's a known Issue, see SO: Beautiful soup find_all doesn't find CSS selector with multiple classes

The workaround ist to use .select(...) instead of .find_all(...) ,
didn't find .select(...) in pattern.web

For example:

from bs4 import BeautifulSoup

html = """<html><head><title>pattern.web | CLiPS</title></head>
  <body>
    <div class="class1 class4">
      <form action="/pages/pattern-web"  accept-charset="UTF-8" method="post" id="search-block-form">
        <div class="class1 class2 class3">
          <label for="edit-search-block-form-1">Search this site: </label>
        </div>
      </form>
    </div>
</body></html>
"""
soup = BeautifulSoup(html, 'html.parser')
div = soup.select('div.class1.class2')
print("{}".format(div))

Output :

 [<div class="class1 class2 class3"> <label for="edit-search-block-form-1">Search this site: </label> </div>] 

Question : it's also giving me unicode errors (it appears that the second period causes a unicode error) :

 TypeError: descriptor 'lower' requires a 'str' object but received a 'unicode' 

It's unknown, if this TypeError is from pattern.web or Beautiful Soup .
According to this SO: descriptor-join-requires-a-unicode-object-but-received-a-str it's a standard Python message.


Using pattern.web from GitHub, the results are as expected:

from pattern.web import Element

elements = Element(html)
print("Search Results by Method:")
print('tag[attr="value"] Notation\tResults:{}'
    .format(elements('div[class="class1 class2 class3"]')))

print('tag.class Notation \t\t\tResults:{}'
    .format(elements('div.class1.class2.class3')))

print('By class, no tag \t\t\tResults:{}'
    .format(elements.by_class('class1 class2 class3')))

print('Looping through all divs and printing matching results:')
for i in elements('div'):
    if 'class' in i.attrs:
        if " ".join(i.attrs['class']) == 'class1 class2 class3':
            print("\tmatch:{}".format(i.attrs))

Output :

 Search Results by Method: tag[attr="value"] Notation Results:{'class': ['class1', 'class2', 'class3']} tag.class Notation Results:{'class': ['class1', 'class2', 'class3']} By class, no tag Results:{'class': ['class1', 'class2', 'class3']} Looping through all divs and printing matching results: match:{'class': ['class1', 'class2', 'class3']} 

Tested with Python:3.5.3 - pattern.web:3.6 - bs4:4.5.3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM