I am trying to scrape a page with BeautifulSoup and there are <script>
tags inside <span>
tag as shown below
<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>
But as <script>
tags are not parsed as HTML in bs4, following code returns <span>
tag without the text ( "более 20 раз"
)
rating = soup.find("p", {"class": "order-quantity"})
How can I get the text within the <span>
tag?
The text is under the tag <script type="jsv#26^">
. You can search for it using soup.find("script", type="jsv#26^")
.
from bs4 import BeautifulSoup
html = """
<span data-link="{include tmpl='productCardOrderCount' ^~ordersCount=selectedNomenclature^ordersCount}"><script type="jsv#28_"></script>
<script type="jsv#27^"></script>
<script type="jsv#29_"></script>
<script type="jsv#26^"></script>
более 20 раз
<script type="jsv/26^"></script>
<script type="jsv/29_"></script>
<script type="jsv/27^"></script>
<script type="jsv/28_"></script>
</span>
"""
soup = BeautifulSoup(html, "html.parser")
print(soup.find("script", type="jsv#26^").find_next(text=True).strip())
Output:
более 20 раз
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.