[英]BS4: Attribute Error in Web Scraping with Python
I need to extract from this websitelink name of the city where shops are located.我需要从该网站中提取商店所在城市的链接名称。 I created this code:我创建了这段代码:
def get_page_data(number):
print('number:', number)
url = 'https://www.biedronka.pl/pl/sklepy/lista,lat,52.25,lng,21,page,'.format(number)
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
container = soup.find(class_='s-content shop-list-page')
items = container.find_all(class_='shopListElement')
dane = []
for item in items:
miasto = item.find(class_='h4').get_text(strip=True)
adres = item.find(class_='shopFullAddress').get_text(strip=True)
dane.append([adres])
return dane
wszystkie_dane = []
for number in range(1, 2):
dane_na_stronie = get_page_data(number)
wszystkie_dane.extend(dane_na_stronie)
dane = pd.DataFrame(wszystkie_dane, columns=['miasto','adres'])
dane.to_csv('biedronki_lista.csv', index=False)
The problem appears in:问题出现在:
miasto = item.find(class_='h4').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'
Any ideas how to extract name of the city (in h4) from this website?任何想法如何从该网站提取城市名称(在 h4 中)?
class_='h4'
is an attribute you are passing a tag name to the class which is not correct instead: class_='h4'
是您将标签名称传递给 class 的属性,这是不正确的:
miasto = item.find('h4').get_text(strip=True)
Try using:尝试使用:
miasto = item.find('h4').text.split()[0]
Or:或者:
miasto = item.find('h4').get_text(strip=True)
Note:笔记:
"h4" is a tag, not a class. “h4”是一个标签,而不是 class。
Explanation:解释:
<h4 style="margin-bottom: 10px;">
Rzeszów <span class="shopFullAddress">ul.<span class="shopAddress"> </span></span>
.text
, it returns:当您提供.text
时,它会返回:'Rzeszów \tul.'
['Rzeszów', 'ul.']
So do this where-ever you face error in this code.因此,无论您在此代码中遇到错误的任何地方都执行此操作。
dane = []
for item in items:
miasto = item.find('h4').get_text(strip=True)
adres = item.find('shopFullAddress').get_text(strip=True)
dane.append([adres])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.