Scraping web site

Question

I have this:

 from bs4 import BeautifulSoup
 import requests

 page = requests.get("https://www.marca.com/futbol/primera/equipos.html")
 soup = BeautifulSoup(page.content, 'html.parser')
 equipos = soup.findAll('li', attrs={'id':'nombreEquipo'})

 aux = []
 for equipo in equipos:
     aux.append(equipo)

If i do print(aux[0]) i got this: , Villarreal

Entrenador:

Javier Calleja

Jugadores:

1 Sergio Asenjo

13 Andrés Fernández

25 Mariano Barbosa

...

And my problem is i want to take the tag:

  <h2 class="cintillo">Villarreal</h2>

And the tag:

1 Sergio Asenjo

And put it into a bataBase How can i take that? Thanks

Answer 1

You can extract the first <h2 class="cintillo"> element from equipo like this:

h2 = str(equipo.find('h2', {'class':'cintillo'}))

If you only want the inner HTML (without any tags), use:

h2 = equipo.find('h2', {'class':'cintillo'}).text

And you can extract all the <span class="dorsal-jugador"> elements from equipo like this:

jugadores = equipo.find_all('span', {'class':'dorsal-jugador'})

Then append h2 and jugadores to a multi-dimensional list.

Full code:

from bs4 import BeautifulSoup
import requests

page = requests.get("https://www.marca.com/futbol/primera/equipos.html")
soup = BeautifulSoup(page.content, 'html.parser')
equipos = soup.findAll('li', attrs={'id':'nombreEquipo'})

aux = []
for equipo in equipos:
        h2 = equipo.find('h2', {'class':'cintillo'}).text
        jugadores = equipo.find_all('span', {'class':'dorsal-jugador'})
        aux.append([h2,[j.text for j in jugadores]])

# format list for printing
print('\n\n'.join(['--'+i[0]+'--\n' + '\n'.join(i[1])  for i in aux]))

Output sample:

--Alavés--
Fernando Pacheco
Antonio Sivera
Álex Domínguez
Carlos Vigaray
...

Demo: https://repl.it/@glhr/55550385

Answer 2

You could create a dictionary of team names as keys with lists of [entrenador, players ] as values

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.marca.com/futbol/primera/equipos.html')
soup = bs(r.content, 'lxml')

teams = {}

for team in soup.select('[id=nombreEquipo]'):
    team_name = team.select_one('.cintillo').text 
    entrenador = team.select_one('dd').text
    players = [item.text for item in team.select('.dorsal-jugador')]
    teams[team_name] = {entrenador : players}
print(teams)

Scraping web site

Question

2 answers

solution1
0 ACCPTED 2019-04-07 07:50:06

solution2
0 2019-04-07 09:20:06

Scraping web site

Question

2 answers

solution1 0 ACCPTED 2019-04-07 07:50:06

solution2 0 2019-04-07 09:20:06

solution1
0 ACCPTED 2019-04-07 07:50:06

solution2
0 2019-04-07 09:20:06