Scraping <span> tag with BeautifulSoup

Question

I am trying to scrape a page with BeautifulSoup and there are <script> tags inside <span> tag as shown below

<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>

But as <script> tags are not parsed as HTML in bs4, following code returns <span> tag without the text ( "более 20 раз" )

rating = soup.find("p", {"class": "order-quantity"})

How can I get the text within the <span> tag?

Answer 1

The text is under the tag <script type="jsv#26^"> . You can search for it using soup.find("script", type="jsv#26^") .

from bs4 import BeautifulSoup


html = """
<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>
"""

soup = BeautifulSoup(html, "html.parser")

print(soup.find("script", type="jsv#26^").find_next(text=True).strip())

Output:

более 20 раз

Scraping <span> tag with BeautifulSoup

Question

1 answers

solution1
0 2021-03-07 23:19:12

Scraping <span> tag with BeautifulSoup

Question

1 answers

solution1 0 2021-03-07 23:19:12

solution1
0 2021-03-07 23:19:12