This is what I have so far:
import csv, re
from bs4 import BeautifulSoup as soup
import requests
flag = False
with open('filename.csv', 'w') as f:
write = csv.writer(f)
for i in range(38050, 38050): ##this is so I can test run with one page
s = soup(requests.get('https://howlongtobeat.com/game.php?id={i}').text, 'html.parser')
if not flag: #write header to file once
write.writerow(['Name', 'Length']+[re.sub('[:\n]+', '', i.find('strong').text) for i in s.find_all('div', {'class':'profile_info'})])
flag = True
## this is for if there is no page or an error
content = s.find('div', {"class":'profile_header shadow_text'})
if content:
name = s.find('div', {"class":'profile_header shadow_text'}).text
length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]
this is not writing to csv and don't know why (I'm just a beginner)
I am trying to create a loop to check if these elements exist and if so write them to a 'hltb.csv'
how can I do this?
Your are iterating over an empty range.
for i in range(38050, 38050):
The size of this range is 0. Try to increase the max by 1.
for i in range(38050, 38051):
You may need to increment the value on your for loop.
page = 38050
for i in range(0,page):
page += 1
This script will run forever. You need to add some kind of HTTP STATUS CODE 404 handler in case you dont find any so the script could end. I think what you do is a bad approach I would rather access each link from the menu of the site and crawl anything related to the URL https://howlongtobeat.com/game.php?id= that way I will know finite Urls where to look instead of guessing incremental ID's
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.