[英]How to use html.parser
Hi everyone I am new to python and trying to use html.parser module of python, I want to scrape this website and fetch the urls, deal name and price with html.parser which is present inside an li
tag https://www.mcdelivery.com.pk/pk/browse/menu.html After fetching the url i want to append them in the base URL and fetch the deals with prices from that site too. Hi everyone I am new to python and trying to use html.parser module of python, I want to scrape this website and fetch the urls, deal name and price with html.parser which is present inside an
li
tag https://www. mcdelivery.com.pk/pk/browse/menu.html After fetching the url i want to append them in the base URL and fetch the deals with prices from that site too.
import urllib.request
import urllib.parse
import re
from html.parser import HTMLParser
url = 'https://www.mcdelivery.com.pk/pk/browse/menu.html'
values = {'daypartId': '1', 'catId': '1'}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
req = urllib.request.Request(url, data)
resp = urllib.request.urlopen(req)
respData = resp.read()
list1 = re.findall(r'<div class="product-cost"(.*?)</div>', str(respData))
for eachp in list1:
print(eachp)
Was using regex to grab the class but I failed.正在使用正则表达式来获取 class 但我失败了。 Now trying to figure out how to do it with html.parser.
现在试图弄清楚如何使用 html.parser 来做到这一点。 I know the job gets easier with
beautifulsoup and scrapy
but I am trying it to do with bare python, so please skip the 3rd party libraries.我知道使用
beautifulsoup and scrapy
工作变得更容易,但我正在尝试使用裸 python,所以请跳过第 3 方库。 i really need help.我真的需要帮助。 I'm stuck.
我被困住了。 Html.parser code (updated)
Html.parser 代码(更新)
from html.parser import HTMLParser
import urllib.request
import html.parser
# Import HTML from a URL
url = urllib.request.urlopen(
"https://www.mcdelivery.com.pk/pk/browse/menu.html")
html = url.read().decode()
url.close()
class MyParser(html.parser.HTMLParser):
def __init__(self, html):
self.matches = []
self.match_count = 0
super().__init__()
def handle_data(self, data):
self.matches.append(data)
self.match_count += 1
def handle_starttag(self, tag, attrs):
attrs = dict(attrs)
if tag == "div":
if attrs.get("product-cost"):
self.handle_data()
else:
return
parser = MyParser(html)
parser.feed(html)
for item in parser.matches:
print(item)
Here's a good start that might require specific tuning:这是一个可能需要特定调整的良好开端:
import html.parser
class MyParser(html.parser.HTMLParser):
def __init__(self, html):
self.matches = []
self.match_count = 0
super().__init__()
def handle_data(self, data):
self.matches.append(data)
self.match_count += 1
def handle_starttag(self, tag, attrs):
attrs = dict(attrs)
if tag == "div":
if attrs.get("product-cost"):
self.handle_data()
else: return
The usage is along the lines of用法如下
request_html = the_request_method(url, ...)
parser = MyParser()
parser.feed(request_html)
for item in parser.matches:
print(item)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.