简体   繁体   中英

Get a certain tag info with HTMLParser()

I have a page that have some same classes, like five <div class='price'></div> . I need to get the information from the certain class using HTMLParser() . Needed class is in the bottom of that list and have an upper class in html tree. Problem the my code shows me the first div tag, but I need another. How do I get this?

I need to extract "1015" from the page, but mu code shows 150. Page HTML:

<div class='price'>150</div>
    <div class='form-row'></div>
        <input type="hidden" value="15121" name="add-to-cart">
            <div class='price'>
                ::before
                "1015"
            </div>

My code:

class ParserLyku(HTMLParser):

    price_is_found = is_price_field = None
    _product_info = {}
    _all_prices = []

    def handle_starttag(self, tag, attrs):
        if (not self.price_is_found and
                'class' not in self._product_info and
                tag == 'div'):
            attrs = dict(attrs)
            if attrs.get('class') == 'price':
                self.is_price_field = True

    def handle_data(self, data):
        if (not self.price_is_found and
                self.is_price_field and
                'class' not in self._product_info):
                self._product_info['price'] = data
                self.price_is_found = True

There are many ways to do that, one possible solution is to count how many <div class="price"> we've encountered so far (for example, we skip the first price):

from html.parser import HTMLParser

html_doc = '''\
<div class='price'>150</div>
    <div class='form-row'></div>
    <input type="hidden" value="15121" name="add-to-cart">

        <div class='price'>
            "1015"
        </div>
'''


class ParserLyku(HTMLParser):
    to_find = ('div', ('class', 'price'))

    def __init__(self):
        HTMLParser.__init__(self)
        self.__opened_tags = []
        self.__counter = 0
        self.prices = []

    def handle_starttag(self, tag, attrs):
        if (tag, *attrs) == ParserLyku.to_find:
            self.__counter += 1

        self.__opened_tags.append((tag, *attrs))

    def handle_endtag(self, tag):
        self.__opened_tags.pop()

    def handle_data(self, data):
        if self.__opened_tags and self.__opened_tags[-1] == ParserLyku.to_find and self.__counter > 1:
            self.prices.append(data.strip())

parser = ParserLyku()
parser.feed(html_doc)

print(parser.prices)

Prints:

['"1015"']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM