简体   繁体   中英

python beautifulsoup and writing to CSV (multiple URLs)

This is what I have so far:

import csv, re
from bs4 import BeautifulSoup as soup
import requests
flag = False
with open('filename.csv', 'w') as f:
  write = csv.writer(f)
  for i in range(38050, 38050): ##this is so I can test run with one page 
    s = soup(requests.get('https://howlongtobeat.com/game.php?id={i}').text, 'html.parser')
    if not flag: #write header to file once
      write.writerow(['Name', 'Length']+[re.sub('[:\n]+', '', i.find('strong').text) for i in s.find_all('div', {'class':'profile_info'})])
      flag = True
  ## this is for if there is no page or an error  
content = s.find('div', {"class":'profile_header shadow_text'})
if content: 
  name = s.find('div', {"class":'profile_header shadow_text'}).text
  length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
  stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]

this is not writing to csv and don't know why (I'm just a beginner)

I am trying to create a loop to check if these elements exist and if so write them to a 'hltb.csv'

how can I do this?

Your are iterating over an empty range.

for i in range(38050, 38050):

The size of this range is 0. Try to increase the max by 1.

for i in range(38050, 38051):

You may need to increment the value on your for loop.

page = 38050
for i in range(0,page):
    page += 1

This script will run forever. You need to add some kind of HTTP STATUS CODE 404 handler in case you dont find any so the script could end. I think what you do is a bad approach I would rather access each link from the menu of the site and crawl anything related to the URL https://howlongtobeat.com/game.php?id= that way I will know finite Urls where to look instead of guessing incremental ID's

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM