简体   繁体   English

如何编辑 Python 代码以循环请求以从列表中提取信息

[英]How to edit Python code to loop the request to extract information from list

I'm only a couple of weeks into learning Python, and I'm trying to extract specific info from a list (events).我只有几周的时间来学习 Python,我正在尝试从列表(事件)中提取特定信息。 I've been able to call the list and extract specific lines (info for single event), but the objective is running the program and extracting the information from the entirety of the called list (info from all of the events).我已经能够调用列表并提取特定行(单个事件的信息),但目标是运行程序并从整个被调用列表中提取信息(来自所有事件的信息)。

Among others, my best guesses so far have been along the lines of:其中,到目前为止,我最好的猜测是:

one_a_tag = soup.findAll('a')[22:85]

and

one_a_tag = soup.findAll('a')[22+1]

But I come up with these errors:但我想出了这些错误:

    TypeError                                 Traceback (most recent call last)
<ipython-input-15-ee19539fbb00> in <module>
     11 soup.findAll('a')
     12 one_a_tag = soup.findAll('a')[22:85]
---> 13 link = one_a_tag['href']
     14 'https://arema.mx' + link
     15 eventUrl = ('https://arema.mx' + link)

TypeError: list indices must be integers or slices, not str

And

TypeError                                 Traceback (most recent call last)
<ipython-input-22-81d98bcf8fd8> in <module>
     10 soup
     11 soup.findAll('a')
---> 12 one_a_tag = soup.findAll('a')[22]+1
     13 link = one_a_tag['href']
     14 'https://arema.mx' + link

TypeError: unsupported operand type(s) for +: 'Tag' and 'int'

This is the entire code so far:这是到目前为止的全部代码:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'https://arema.mx/'

response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
soup
soup.findAll('a')
one_a_tag = soup.findAll('a')[22]
link = one_a_tag['href']
'https://arema.mx' + link 
eventUrl = ('https://arema.mx' + link)  
print(eventUrl)

def getAremaTitulo(eventUrl):
    res = requests.get(eventUrl)
    res.raise_for_status()
    
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('body > div.body > div.ar.eventname')
    return elems[0].text.strip()

def getAremaInfo(eventUrl):
    res = requests.get(eventUrl)
    res.raise_for_status()
    
    soup = bs4.BeautifulSoup(res.text, 'html.parser')
    elems = soup.select('body > div.body > div.event-header')
    return elems[0].text.strip()

titulo = getAremaTitulo(eventUrl)
print('Nombre de evento: ' + titulo)

info = getAremaInfo(eventUrl)
print('Info: ' + info)

time.sleep(1)

I'm sure there may be some redundancies in the code, but what I'm most keen on solving is creating a loop to extract the specific info I'm looking for from all of the events.我确信代码中可能存在一些冗余,但我最热衷于解决的是创建一个循环来从所有事件中提取我正在寻找的特定信息。 What do I need to add to get there?我需要添加什么才能到达那里?

Thanks!谢谢!

To get all information about events, you can use this script:要获取有关事件的所有信息,您可以使用此脚本:

import requests
from bs4 import BeautifulSoup


url = 'https://arema.mx/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for event_link in soup.select('#events a.event'):
    u = 'https://arema.mx' + event_link['href']
    s = BeautifulSoup(requests.get(u).content, 'html.parser')

    event_name = s.select_one('.eventname').get_text(strip=True)
    event_info = s.select_one('.event-header').text.strip()

    print(event_name)
    print(event_info)
    print('-' * 80)

Prints:印刷:

...

--------------------------------------------------------------------------------
NOCHE BOHEMIA <A PIANO Y GUITARRA>
"Freddy González y Víctor Freez dos amigos que al
paso del tiempo hermanaron sus talentos para crear un concepto musical cálido y
acústico entre cuerdas y teclas haciéndonos vibrar entre una línea de canciones
de ayer y hoy. Rescatando las bohemias que tantos recuerdos y encuentros nos han
generado a lo largo del tiempo.

 Precio: $69*ya incluye cargo de servicio.Fecha: Sábado 15 de agosto 20:00 hrsTransmisión en vivo por Arema LiveComo ingresar a ver la presentación.·         Dale clic en Comprar  y Selecciona tu acceso.·         Elije la forma de pago que más se te facilite y finaliza la compra.·         Te llegara un correo electrónico con la confirmación de compra y un liga exclusiva para ingresar a la transmisión el día seleccionado únicamente.La compra de tu boleto es un apoyo para el artista.Importante:  favor de revisar tu correo en bandeja de entrada, no deseados o spam ya que los correos en ocasiones son enviados a esas carpetas.
--------------------------------------------------------------------------------
    
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM