[英]Multiple scraping: problem in the code. What am I doing wrong?
I am trying to use Selenium scraping on multiple elements.我正在尝试在多个元素上使用 Selenium 抓取。 A multiple scrap with multiple scraped elements that create a row that will fit into the database.具有多个刮取元素的多重报废,这些元素创建了一个适合数据库的行。 I have never created a multiple scraping so far, but I have always scraped single elements.到目前为止,我从未创建过多次抓取,但我总是抓取单个元素。 So there is some problem in the code.所以代码中存在一些问题。
I would like to create this row for each round (round 1, round 2, etc.) of the championship: Round, Date, Team_Home, Team_Away, Result_Home, Result_Away .我想为锦标赛的每一轮(第 1 轮、第 2 轮等)创建这一行: Round, Date, Team_Home, Team_Away, Result_Home, Result_Away 。 In detail, just for information and to give you a better idea, there will be 8 rows for each championship round.详细地说,仅供参考并为您提供更好的主意,每个锦标赛回合将有 8 行。 The total turns are 26. I'm not getting any errors, but the output is just >>>.总转数为 26。我没有收到任何错误,但输出只是 >>>。 I only receive this >>>, with no text or errors.我只收到这个 >>>,没有文本或错误。
PS: scraping is for the purpose of my personal study. PS:刮刮是为了我个人的学习。 It is not for profit不是为了盈利
I would like to get, for example, this:我想得到,例如,这个:
#SWEDEN ALLSVENKAN
#Round, Date, Team_Home, Team_Away, Result_Home, Result_Away
Round 1, 11/31/2021 20:45, AIK Stockholm, Malmo, 2, 1
Round 1, 11/31/2021 20:45, Elfsborg, Gothenburg, 2, 3
...and the rest of the other matches of the 1st round
Round 2, 06/12/2021 20:45, Gothenburg, AIK Stockholm, 0, 1
Round 2, 06/12/2021 20:45, Malmo, Elfsborg, 1, 1
...and the rest of the other matches of the 2st round
Round 3, etc.
Python code for scraping:用于抓取的 Python 代码:
Values_Allsvenskan = []
#SCRAPING
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
driver.minimize_window()
for Allsvenskan in multiple_scraping:
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except:
pass
multiple_scraping = round, date, team_home, team_away, score_home, score_away
#row/record
round = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__round event__round--static']")
date = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__participant event__participant--home']")
team_away = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__participant event__participant--away']")
score_home = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__score event__score--home']")
score_away = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__score event__score--away']")
Allsvenskan_text = round.text, date.text, team_home.text, team_away.text, score_home.text, score_away.text
Values_Allsvenskan.append(tuple([Allsvenskan_text]))
print(Allsvenskan_text)
driver.close
#INSERT IN DATABASE
con = sqlite3.connect('/database.db')
cursor = con.cursor()
sqlite_insert_query_Allsvenskan = 'INSERT INTO All_Score (round, date, team_home, team_away, score_home, score_away) VALUES (?, ?, ?, ?, ?, ?);'
cursor.executemany(sqlite_insert_query_Allsvenskan, Values_Allsvenskan)
con.commit()
Based on my python code, can you show me how I can fix and fix the code?根据我的 python 代码,你能告诉我如何修复和修复代码吗? Thanks谢谢
You use find_elements
to get lists with all rounds
, all date
, all team_home
, all team_away
, etc. so you have values in separated list and you should use zip()
to group values in lists like [ single round
, single date
, single team_home
, ...]`您使用find_elements
获取包含所有rounds
、所有date
、所有team_home
、所有team_away
等的列表,因此您在单独的列表中有值,并且您应该使用zip()
将列表中的值分组,例如 [ single round
, single date
, single team_home
, ...]`
results = []
for row in zip(date, team_home, team_away, score_home, score_away):
row = [item.text for item in row]
print(row)
results.append(row)
I skiped round
because it makes more problems it will need totally differnt code.我skiped round
,因为它使更多的问题,它需要完全不同的充码。
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
#driver.minimize_window()
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except Exception as ex:
print('EX:', ex)
round = driver.find_elements(By.CSS_SELECTOR, "[class^='event__round event__round--static']")
date = driver.find_elements(By.CSS_SELECTOR, "[class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = driver.find_elements(By.CSS_SELECTOR, "[class^='event__participant event__participant--home']")
team_away = driver.find_elements(By.CSS_SELECTOR, "[class^='event__participant event__participant--away']")
score_home = driver.find_elements(By.CSS_SELECTOR, "[class^='event__score event__score--home']")
score_away = driver.find_elements(By.CSS_SELECTOR, "[class^='event__score event__score--away']")
results = []
for row in zip(date, team_home, team_away, score_home, score_away):
row = [item.text for item in row]
print(row)
results.append(row)
Result:结果:
['01.11. 19:00', 'Degerfors', 'Göteborg', '0', '1']
['01.11. 19:00', 'Halmstad', 'AIK Stockholm', '1', '0']
['01.11. 19:00', 'Mjallby', 'Hammarby', '2', '0']
['31.10. 17:30', 'Örebro', 'Djurgarden', '0', '1']
['31.10. 15:00', 'Norrkoping', 'Elfsborg', '3', '2']
['30.10. 17:30', 'Hacken', 'Kalmar', '1', '4']
['30.10. 15:00', 'Sirius', 'Malmo FF', '2', '3']
['30.10. 15:00', 'Varbergs', 'Östersunds', '3', '0']
['28.10. 19:00', 'Degerfors', 'Elfsborg', '1', '2']
['28.10. 19:00', 'Göteborg', 'Djurgarden', '3', '0']
['28.10. 19:00', 'Halmstad', 'Örebro', '1', '1']
['28.10. 19:00', 'Norrkoping', 'Mjallby', '2', '2']
['27.10. 19:00', 'Kalmar', 'Varbergs', '2', '2']
['27.10. 19:00', 'Malmo FF', 'AIK Stockholm', '1', '0']
['27.10. 19:00', 'Östersunds', 'Hacken', '1', '1']
['27.10. 19:00', 'Sirius', 'Hammarby', '0', '1']
['25.10. 19:00', 'Örebro', 'Degerfors', '1', '2']
['24.10. 17:30', 'AIK Stockholm', 'Norrkoping', '1', '0']
...
But this method may sometimes makes problem - if some row has empty place then it will move value from next row to current row, etc. This way it can create wrong rows.但是这种方法有时可能会产生问题 - 如果某行有空位,那么它会将值从下一行移动到当前行,等等。这样它就可以创建错误的行。
Better is to find all rows ( div
or tr
in table
) and next use for-loop
to work with every row separatelly and use row.find_elements
instead of driver.find_elements
.更好的是找到所有行( table
div
或tr
),然后使用for-loop
单独处理每一行并使用row.find_elements
而不是driver.find_elements
。 This should also resolve problem with round
which will need to read value and later duplicate it in next rows.这也应该解决round
需要读取值并随后在下一行中复制它的问题。
I search rows with event__round
or event__match
and next I check what classes has row.我使用event__round
或event__match
搜索行, event__round
检查哪些类有行。 If it has event__round
then I get round
.如果它有event__round
那么我得到round
。 If it has event__match
then I use find_element
without s
at the end to get single date
, single team_home
, single team_away
, etc (because in single row there are only single values) and use them with current_round
to create row.如果它有event__match
那么我使用find_element
在末尾没有s
来获取单个date
、单个team_home
、单个team_away
等(因为在单行中只有单个值)并将它们与current_round
一起使用来创建行。
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
#driver.minimize_window()
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except Exception as ex:
print('EX:', ex)
all_rows = driver.find_elements(By.CSS_SELECTOR, "div[class^='event__round'],div[class^='event__match']")
results = []
current_round = '?'
for row in all_rows:
classes = row.get_attribute('class')
#print(classes)
if 'event__round' in classes:
#round = row.find_elements(By.CSS_SELECTOR, "[class^='event__round event__round--static']")
current_round = row.text
else:
date = row.find_element(By.CSS_SELECTOR, "[class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = row.find_element(By.CSS_SELECTOR, "[class^='event__participant event__participant--home']")
team_away = row.find_element(By.CSS_SELECTOR, "[class^='event__participant event__participant--away']")
score_home = row.find_element(By.CSS_SELECTOR, "[class^='event__score event__score--home']")
score_away = row.find_element(By.CSS_SELECTOR, "[class^='event__score event__score--away']")
row = [current_round, date.text, team_home.text, team_away.text, score_home.text, score_away.text]
print(row)
results.append(row)
Result:结果:
['Giornata 26', '01.11. 19:00', 'Degerfors', 'Göteborg', '0', '1']
['Giornata 26', '01.11. 19:00', 'Halmstad', 'AIK Stockholm', '1', '0']
['Giornata 26', '01.11. 19:00', 'Mjallby', 'Hammarby', '2', '0']
['Giornata 26', '31.10. 17:30', 'Örebro', 'Djurgarden', '0', '1']
['Giornata 26', '31.10. 15:00', 'Norrkoping', 'Elfsborg', '3', '2']
['Giornata 26', '30.10. 17:30', 'Hacken', 'Kalmar', '1', '4']
['Giornata 26', '30.10. 15:00', 'Sirius', 'Malmo FF', '2', '3']
['Giornata 26', '30.10. 15:00', 'Varbergs', 'Östersunds', '3', '0']
['Giornata 25', '28.10. 19:00', 'Degerfors', 'Elfsborg', '1', '2']
['Giornata 25', '28.10. 19:00', 'Göteborg', 'Djurgarden', '3', '0']
['Giornata 25', '28.10. 19:00', 'Halmstad', 'Örebro', '1', '1']
# ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.