am trying to scrape data from this page https://www.flashscore.pl/druzyna/ajax/8UOvIwnb/tabela
Q1:I created this code, but I don't know how to extract data for AJAX team only. The data is to be saved as a list. later they will be saved to csv file. In addition, I am not interested, for example, the sign "?" how to exclude it? I'll be grateful for your help.
Q2: How can i separate anserw for "AJAX" eg with ";" Ajax;18;13;3;2;56:4;42;?;W;W;P;W;W;
CODE
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup as BS
import requests
from time import sleep
driver = webdriver.Chrome()
driver.get("https://www.flashscore.pl/druzyna/ajax/8UOvIwnb/tabela/")
sleep(10)
page = driver.page_source
soup = BS(page,'html.parser')
content3 = soup.find('div',{'class':'ui-table__body'})
content_list3 = content.find_all('div',{'class':'tableCellFormIcon tableCellFormIcon--TBD'})
for i in content3:
print(i.text.split()[0])
RESULTS
1.PSV18141346:2443?WWWWR
2.Ajax18133256:442?WWPWW
3.Feyenoord18123342:1739?WPRWW
4.Vitesse18103525:2533?WRWWR
5.Alkmaar18102635:2332?WWWWW
6.Twente1895428:2232?RWWWR
7.Utrecht1885533:2329?RRRPW
8.Cambuur1891832:3928?RPWPW
9.Nijmegen1874724:2625?WWPPP
10.Heerenveen1874720:2525?PWRWR
11.G.A.
12.Groningen1847720:2719?PPRRW
13.Heracles18531021:2618?RWPPP
14.Willem
15.Waalwijk1837819:3016?RPPWR
16.Sparta
17.Sittard18341119:4613?PRWPP
18.Zwolle1813149:326?PPPRR
You can add it to a list:
res = []
for i in content3:
line = i.text.split()[0]
print(line)
res.append(line)
https://docs.python.org/3/tutorial/datastructures.html -
list.append(x) Add an item to the end of the list. Equivalent to a[len(a):] = [x].
to replace the "?" add this:
line = line.replace("?", "")
https://docs.python.org/3/library/stdtypes.html#str.replace -
str.replace(old, new[, count]) Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Added regular expressions and sorted "Ajax"
import re
...
res = []
for i in content3:
line = i.text.split()[0]
if re.search('Ajax', line):
line = line.replace("?", "")
res.append(line)
print(res)
Another question to main topic. How can i get olny that results with separate ";"
Results
['1.Ajax20153261:548WWWWP']
expected result ( separete; and miss few rows value 20 and value 48 in this example)
Ajax;15;3;2;61:5;W;W;W;W;P'
code below
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup as BS
import requests
from time import sleep
import re
driver = webdriver.Chrome()
driver.get("https://www.flashscore.pl/druzyna/ajax/8UOvIwnb/tabela/")
sleep(10)
page = driver.page_source
soup = BS(page,'html.parser')
content3 = soup.find('div',{'class':'ui-table__body'})
content_list3 = content3.find_all('div',{'class':'tableCellFormIcon
tableCellFormIcon--TBD'})
res = []
for i in content3:
line = i.text.split()[0]
if re.search('Ajax', line):
line = line.replace("?", "")
res.append(line)
print(res)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.