简体   繁体   中英

How to edit Python code to loop the request to extract information from list

I'm only a couple of weeks into learning Python, and I'm trying to extract specific info from a list (events). I've been able to call the list and extract specific lines (info for single event), but the objective is running the program and extracting the information from the entirety of the called list (info from all of the events).

Among others, my best guesses so far have been along the lines of:

one_a_tag = soup.findAll('a')[22:85]

and

one_a_tag = soup.findAll('a')[22+1]

But I come up with these errors:

    TypeError                                 Traceback (most recent call last)
<ipython-input-15-ee19539fbb00> in <module>
     11 soup.findAll('a')
     12 one_a_tag = soup.findAll('a')[22:85]
---> 13 link = one_a_tag['href']
     14 'https://arema.mx' + link
     15 eventUrl = ('https://arema.mx' + link)

TypeError: list indices must be integers or slices, not str

And

TypeError                                 Traceback (most recent call last)
<ipython-input-22-81d98bcf8fd8> in <module>
     10 soup
     11 soup.findAll('a')
---> 12 one_a_tag = soup.findAll('a')[22]+1
     13 link = one_a_tag['href']
     14 'https://arema.mx' + link

TypeError: unsupported operand type(s) for +: 'Tag' and 'int'

This is the entire code so far:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'https://arema.mx/'

response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
soup
soup.findAll('a')
one_a_tag = soup.findAll('a')[22]
link = one_a_tag['href']
'https://arema.mx' + link 
eventUrl = ('https://arema.mx' + link)  
print(eventUrl)

def getAremaTitulo(eventUrl):
    res = requests.get(eventUrl)
    res.raise_for_status()
    
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('body > div.body > div.ar.eventname')
    return elems[0].text.strip()

def getAremaInfo(eventUrl):
    res = requests.get(eventUrl)
    res.raise_for_status()
    
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('body > div.body > div.event-header')
    return elems[0].text.strip()

titulo = getAremaTitulo(eventUrl)
print('Nombre de evento: ' + titulo)

info = getAremaInfo(eventUrl)
print('Info: ' + info)

time.sleep(1)

I'm sure there may be some redundancies in the code, but what I'm most keen on solving is creating a loop to extract the specific info I'm looking for from all of the events. What do I need to add to get there?

Thanks!

To get all information about events, you can use this script:

import requests
from bs4 import BeautifulSoup


url = 'https://arema.mx/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for event_link in soup.select('#events a.event'):
    u = 'https://arema.mx' + event_link['href']
    s = BeautifulSoup(requests.get(u).content, 'html.parser')

    event_name = s.select_one('.eventname').get_text(strip=True)
    event_info = s.select_one('.event-header').text.strip()

    print(event_name)
    print(event_info)
    print('-' * 80)

Prints:

...

--------------------------------------------------------------------------------
NOCHE BOHEMIA <A PIANO Y GUITARRA>
"Freddy González y Víctor Freez dos amigos que al
paso del tiempo hermanaron sus talentos para crear un concepto musical cálido y
acústico entre cuerdas y teclas haciéndonos vibrar entre una línea de canciones
de ayer y hoy. Rescatando las bohemias que tantos recuerdos y encuentros nos han
generado a lo largo del tiempo.

 Precio: $69*ya incluye cargo de servicio.Fecha: Sábado 15 de agosto 20:00 hrsTransmisión en vivo por Arema LiveComo ingresar a ver la presentación.·         Dale clic en Comprar  y Selecciona tu acceso.·         Elije la forma de pago que más se te facilite y finaliza la compra.·         Te llegara un correo electrónico con la confirmación de compra y un liga exclusiva para ingresar a la transmisión el día seleccionado únicamente.La compra de tu boleto es un apoyo para el artista.Importante:  favor de revisar tu correo en bandeja de entrada, no deseados o spam ya que los correos en ocasiones son enviados a esas carpetas.
--------------------------------------------------------------------------------
    
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM