[英]I'm using Python 3.7 an BS4 for web scraping, there is a problem I couldn't solve, hope someone knows how to fix this
我想从源页面获取产品信息,我想要的数据在HTML标记中 ,但标记中还有另一个标记,因此当我将数据保存到本地存储时,它看起来非常糟糕。 我希望有人知道如何解决此问题。
这是我的代码:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://list.jd.com/list.html?
cat=9987,653,655&ev=exbrand_15127&page=1'
#opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
filename = "params.csv"
f = open(filename,"w")
#grabs each product
li_containers = page_soup.findAll("li",{"class":"gl-item"})
for i in range(0,len(li_containers)):
p_name_div = li_containers[i].find("div",{"class":"p-name"})
p_name = p_name_div.a.em.text.strip()
print(p_name)
f.write(p_name)
f.close()
有一些截图。
我希望它像这样:
但最终看起来像这样:
尝试这个
my_url = 'https://list.jd.com/list.html?
cat=9987,653,655&ev=exbrand_15127&page=1'
#opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
filename = "params.csv"
f = open(filename,"w")
#grabs each product
li_containers = page_soup.findAll("li",{"class":"gl-item"})
for i in range(0,len(li_containers)):
p_name_div = li_containers[i].find("div",{"class":"p-name"})
p_name = p_name_div.a.em.text.strip()
print(p_name.strip(" "))
f.write(p_name.strip(" "))
f.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.