Scraping specific sports data SELENIUM/BS4

Question

am trying to scrape data from this page https://www.flashscore.pl/druzyna/ajax/8UOvIwnb/tabela

Q1:I created this code, but I don't know how to extract data for AJAX team only. The data is to be saved as a list. later they will be saved to csv file. In addition, I am not interested, for example, the sign "?" how to exclude it? I'll be grateful for your help.

Q2: How can i separate anserw for "AJAX" eg with ";" Ajax;18;13;3;2;56:4;42;?;W;W;P;W;W;

CODE

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup as BS
import requests
from time import sleep
driver = webdriver.Chrome()
driver.get("https://www.flashscore.pl/druzyna/ajax/8UOvIwnb/tabela/")
sleep(10)
page = driver.page_source
soup = BS(page,'html.parser')
content3 = soup.find('div',{'class':'ui-table__body'})
content_list3 = content.find_all('div',{'class':'tableCellFormIcon tableCellFormIcon--TBD'})

for i in content3:
    print(i.text.split()[0])

RESULTS

1.PSV18141346:2443?WWWWR
2.Ajax18133256:442?WWPWW
3.Feyenoord18123342:1739?WPRWW
4.Vitesse18103525:2533?WRWWR
5.Alkmaar18102635:2332?WWWWW
6.Twente1895428:2232?RWWWR
7.Utrecht1885533:2329?RRRPW
8.Cambuur1891832:3928?RPWPW
9.Nijmegen1874724:2625?WWPPP
10.Heerenveen1874720:2525?PWRWR
11.G.A.
12.Groningen1847720:2719?PPRRW
13.Heracles18531021:2618?RWPPP
14.Willem
15.Waalwijk1837819:3016?RPPWR
16.Sparta
17.Sittard18341119:4613?PRWPP
18.Zwolle1813149:326?PPPRR

Answer 1

You can add it to a list:

res = []
for i in content3:
    line = i.text.split()[0]
    print(line)
    res.append(line)

https://docs.python.org/3/tutorial/datastructures.html -

list.append(x) Add an item to the end of the list. Equivalent to a[len(a):] = [x].

to replace the "?" add this:

line = line.replace("?", "")

https://docs.python.org/3/library/stdtypes.html#str.replace -

str.replace(old, new[, count]) Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

Answer 2

Added regular expressions and sorted "Ajax"
import re 
...
res = []
for i in content3:
    line = i.text.split()[0]
    if re.search('Ajax', line):
        line = line.replace("?", "")
        res.append(line)

print(res)

Answer 3

Another question to main topic. How can i get olny that results with separate ";"

Results

 ['1.Ajax20153261:548WWWWP']

expected result ( separete; and miss few rows value 20 and value 48 in this example)

Ajax;15;3;2;61:5;W;W;W;W;P'

code below

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup as BS
import requests
from time import sleep
import re
driver = webdriver.Chrome()
driver.get("https://www.flashscore.pl/druzyna/ajax/8UOvIwnb/tabela/")
sleep(10)
page = driver.page_source
soup = BS(page,'html.parser')
content3 = soup.find('div',{'class':'ui-table__body'})
content_list3 = content3.find_all('div',{'class':'tableCellFormIcon 
tableCellFormIcon--TBD'})
res = []
for i in content3:
   line = i.text.split()[0]
   if re.search('Ajax', line):
       line = line.replace("?", "")
       res.append(line)

print(res)

Scraping specific sports data SELENIUM/BS4

Question

2 answers

solution1
0 2022-01-09 16:07:12

solution2
0 2022-01-09 17:55:21

solution3
0 2022-01-24 21:12:32

Scraping specific sports data SELENIUM/BS4

Question

2 answers

solution1 0 2022-01-09 16:07:12

solution2 0 2022-01-09 17:55:21

solution3 0 2022-01-24 21:12:32

solution1
0 2022-01-09 16:07:12

solution2
0 2022-01-09 17:55:21

solution3
0 2022-01-24 21:12:32