简体   繁体   中英

Beautifulsoup: Can't find tag

I'm a Python beginner and I started using beautifulsoup a few weeks ago. Right now I'm trying to build a simple project to scrape the country information of the reviews posted on some Aliexpress products. When I inspect the HTML code from the website, I see that what I should look for inside beautifulsoup is class="css_flag ..." (see attached picture https://i.stack.imgur.com/LgMae.png ) but I can´t find it.

I have tried to extract all the 'b' tags and had no luck. I also printed the entire soup in the console, then copied it to a wordfile and did a manual search for class="css_flag ..." and found nothing.

This is the code that I'm using right now

from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
import re

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")


tags = soup('b')
for tag in tags:
    print(tag)

I expect the code to return all the strings that include "css_flag" in it. I would really appreciate some help with this, thanks in advance!

There's another url you'll use to access that. You'll need to provide some query data, like the productId number, etc., but you can get what you're after. Code/example below:

import bs4
import requests

request_url = 'https://feedback.aliexpress.com/display/productEvaluation.htm'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'}

payload = {
'v': '2',
'productId': '33023202345',
'ownerMemberId': '232182005',
'companyId': '241422497',
'memberType': 'seller',
'startValidDate':'',
'i18n': 'true'}

res = requests.get(request_url, headers=headers, params=payload)
soup = bs4.BeautifulSoup(res.text)
tags = soup('b')
for tag in tags:
    print(tag)

Output:

<b class="r-graph-scroller" style="width:94.0%;"></b>
<b class="r-graph-scroller" style="width:5.0%;"></b>
<b class="r-graph-scroller" style="width:0.0%;"></b>
<b class="r-graph-scroller" style="width:1.0%;"></b>
<b class="r-graph-scroller" style="width:0.0%;"></b>
<b>4.9</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>
<b class="css_flag css_ru">RU</b>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM