im trying to scrape a quizlet match set with Python. I want to scrape all the <span>
tags with class
: TermText
Here's the URL: 'https://quizlet.com/291523268'
import requests
raw = requests.get(URL).text
raw
ends up returning things that do not contain any tags or cards at all. When I check the source of the website it shows all the TermText
spans that I need meaning it's not JS loaded. Thus, I don't understand why my HTML is coming out wrong since it doesn't contain any of the html I need.
To get correct response from server, set correct User-Agent
HTTP header:
import requests
from bs4 import BeautifulSoup
url = 'https://quizlet.com/291523268/python-flash-cards/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for span in soup.select('span.TermText'):
print(span.get_text(strip=True))
Prints:
algorithm
A set of specific steps for solving a category of problems
token
basic elements of a language(letters, numbers, symbols)
high-level language
A programming language like Python that is designed to be easy for humans to read and write.
low-level langauge
...and so on.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.