[英]What am I doing wrong? Unable to execute a line of code. Python | Data Science | Big Mart Sales Data set
[英]Multiple scraping: problem in the code. What am I doing wrong?
我正在尝试在多个元素上使用 Selenium 抓取。 具有多个刮取元素的多重报废,这些元素创建了一个适合数据库的行。 到目前为止,我从未创建过多次抓取,但我总是抓取单个元素。 所以代码中存在一些问题。
我想为锦标赛的每一轮(第 1 轮、第 2 轮等)创建这一行: Round, Date, Team_Home, Team_Away, Result_Home, Result_Away 。 详细地说,仅供参考并为您提供更好的主意,每个锦标赛回合将有 8 行。 总转数为 26。我没有收到任何错误,但输出只是 >>>。 我只收到这个 >>>,没有文本或错误。
PS:刮刮是为了我个人的学习。 不是为了盈利
我想得到,例如,这个:
#SWEDEN ALLSVENKAN
#Round, Date, Team_Home, Team_Away, Result_Home, Result_Away
Round 1, 11/31/2021 20:45, AIK Stockholm, Malmo, 2, 1
Round 1, 11/31/2021 20:45, Elfsborg, Gothenburg, 2, 3
...and the rest of the other matches of the 1st round
Round 2, 06/12/2021 20:45, Gothenburg, AIK Stockholm, 0, 1
Round 2, 06/12/2021 20:45, Malmo, Elfsborg, 1, 1
...and the rest of the other matches of the 2st round
Round 3, etc.
用于抓取的 Python 代码:
Values_Allsvenskan = []
#SCRAPING
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
driver.minimize_window()
for Allsvenskan in multiple_scraping:
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except:
pass
multiple_scraping = round, date, team_home, team_away, score_home, score_away
#row/record
round = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__round event__round--static']")
date = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__participant event__participant--home']")
team_away = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__participant event__participant--away']")
score_home = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__score event__score--home']")
score_away = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__score event__score--away']")
Allsvenskan_text = round.text, date.text, team_home.text, team_away.text, score_home.text, score_away.text
Values_Allsvenskan.append(tuple([Allsvenskan_text]))
print(Allsvenskan_text)
driver.close
#INSERT IN DATABASE
con = sqlite3.connect('/database.db')
cursor = con.cursor()
sqlite_insert_query_Allsvenskan = 'INSERT INTO All_Score (round, date, team_home, team_away, score_home, score_away) VALUES (?, ?, ?, ?, ?, ?);'
cursor.executemany(sqlite_insert_query_Allsvenskan, Values_Allsvenskan)
con.commit()
根据我的 python 代码,你能告诉我如何修复和修复代码吗? 谢谢
您使用find_elements
获取包含所有rounds
、所有date
、所有team_home
、所有team_away
等的列表,因此您在单独的列表中有值,并且您应该使用zip()
将列表中的值分组,例如 [ single round
, single date
, single team_home
, ...]`
results = []
for row in zip(date, team_home, team_away, score_home, score_away):
row = [item.text for item in row]
print(row)
results.append(row)
我skiped round
,因为它使更多的问题,它需要完全不同的充码。
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
#driver.minimize_window()
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except Exception as ex:
print('EX:', ex)
round = driver.find_elements(By.CSS_SELECTOR, "[class^='event__round event__round--static']")
date = driver.find_elements(By.CSS_SELECTOR, "[class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = driver.find_elements(By.CSS_SELECTOR, "[class^='event__participant event__participant--home']")
team_away = driver.find_elements(By.CSS_SELECTOR, "[class^='event__participant event__participant--away']")
score_home = driver.find_elements(By.CSS_SELECTOR, "[class^='event__score event__score--home']")
score_away = driver.find_elements(By.CSS_SELECTOR, "[class^='event__score event__score--away']")
results = []
for row in zip(date, team_home, team_away, score_home, score_away):
row = [item.text for item in row]
print(row)
results.append(row)
结果:
['01.11. 19:00', 'Degerfors', 'Göteborg', '0', '1']
['01.11. 19:00', 'Halmstad', 'AIK Stockholm', '1', '0']
['01.11. 19:00', 'Mjallby', 'Hammarby', '2', '0']
['31.10. 17:30', 'Örebro', 'Djurgarden', '0', '1']
['31.10. 15:00', 'Norrkoping', 'Elfsborg', '3', '2']
['30.10. 17:30', 'Hacken', 'Kalmar', '1', '4']
['30.10. 15:00', 'Sirius', 'Malmo FF', '2', '3']
['30.10. 15:00', 'Varbergs', 'Östersunds', '3', '0']
['28.10. 19:00', 'Degerfors', 'Elfsborg', '1', '2']
['28.10. 19:00', 'Göteborg', 'Djurgarden', '3', '0']
['28.10. 19:00', 'Halmstad', 'Örebro', '1', '1']
['28.10. 19:00', 'Norrkoping', 'Mjallby', '2', '2']
['27.10. 19:00', 'Kalmar', 'Varbergs', '2', '2']
['27.10. 19:00', 'Malmo FF', 'AIK Stockholm', '1', '0']
['27.10. 19:00', 'Östersunds', 'Hacken', '1', '1']
['27.10. 19:00', 'Sirius', 'Hammarby', '0', '1']
['25.10. 19:00', 'Örebro', 'Degerfors', '1', '2']
['24.10. 17:30', 'AIK Stockholm', 'Norrkoping', '1', '0']
...
但是这种方法有时可能会产生问题 - 如果某行有空位,那么它会将值从下一行移动到当前行,等等。这样它就可以创建错误的行。
更好的是找到所有行( table
div
或tr
),然后使用for-loop
单独处理每一行并使用row.find_elements
而不是driver.find_elements
。 这也应该解决round
需要读取值并随后在下一行中复制它的问题。
我使用event__round
或event__match
搜索行, event__round
检查哪些类有行。 如果它有event__round
那么我得到round
。 如果它有event__match
那么我使用find_element
在末尾没有s
来获取单个date
、单个team_home
、单个team_away
等(因为在单行中只有单个值)并将它们与current_round
一起使用来创建行。
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
#driver.minimize_window()
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except Exception as ex:
print('EX:', ex)
all_rows = driver.find_elements(By.CSS_SELECTOR, "div[class^='event__round'],div[class^='event__match']")
results = []
current_round = '?'
for row in all_rows:
classes = row.get_attribute('class')
#print(classes)
if 'event__round' in classes:
#round = row.find_elements(By.CSS_SELECTOR, "[class^='event__round event__round--static']")
current_round = row.text
else:
date = row.find_element(By.CSS_SELECTOR, "[class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = row.find_element(By.CSS_SELECTOR, "[class^='event__participant event__participant--home']")
team_away = row.find_element(By.CSS_SELECTOR, "[class^='event__participant event__participant--away']")
score_home = row.find_element(By.CSS_SELECTOR, "[class^='event__score event__score--home']")
score_away = row.find_element(By.CSS_SELECTOR, "[class^='event__score event__score--away']")
row = [current_round, date.text, team_home.text, team_away.text, score_home.text, score_away.text]
print(row)
results.append(row)
结果:
['Giornata 26', '01.11. 19:00', 'Degerfors', 'Göteborg', '0', '1']
['Giornata 26', '01.11. 19:00', 'Halmstad', 'AIK Stockholm', '1', '0']
['Giornata 26', '01.11. 19:00', 'Mjallby', 'Hammarby', '2', '0']
['Giornata 26', '31.10. 17:30', 'Örebro', 'Djurgarden', '0', '1']
['Giornata 26', '31.10. 15:00', 'Norrkoping', 'Elfsborg', '3', '2']
['Giornata 26', '30.10. 17:30', 'Hacken', 'Kalmar', '1', '4']
['Giornata 26', '30.10. 15:00', 'Sirius', 'Malmo FF', '2', '3']
['Giornata 26', '30.10. 15:00', 'Varbergs', 'Östersunds', '3', '0']
['Giornata 25', '28.10. 19:00', 'Degerfors', 'Elfsborg', '1', '2']
['Giornata 25', '28.10. 19:00', 'Göteborg', 'Djurgarden', '3', '0']
['Giornata 25', '28.10. 19:00', 'Halmstad', 'Örebro', '1', '1']
# ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.