如何通過多處理加速此 python 腳本

Question

我有一個腳本，它從 dataframe 獲取數據，使用這些數據向網站發出請求，使用fuzzywuzzy 模塊找到確切的href，然后運行function 來獲取賠率。 我會用多處理模塊加速這個腳本，有可能嗎？


                           Date       HomeTeam         AwayTeam
0  Monday 6 December 2021 20:00        Everton          Arsenal
1  Monday 6 December 2021 17:30         Empoli          Udinese
2  Monday 6 December 2021 19:45       Cagliari           Torino
3  Monday 6 December 2021 20:00         Getafe  Athletic Bilbao
4  Monday 6 December 2021 15:00  Real Zaragoza            Eibar
5  Monday 6 December 2021 17:15      Cartagena         Tenerife
6  Monday 6 December 2021 20:00         Girona          Leganes
7  Monday 6 December 2021 19:45          Niort         Toulouse
8  Monday 6 December 2021 19:00      Jong Ajax         FC Emmen
9  Monday 6 December 2021 19:00        Jong AZ        Excelsior

腳本

  df = pd.read_excel(path)

  dates = df.Date
  hometeams = df.HomeTeam
  awayteams = df.AwayTeam

  matches_odds = list()

  for i,(a,b,c) in enumerate(zip(dates, hometeams, awayteams)):
      try:
        r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
      except requests.exceptions.ConnectionError:
        sleep(10)
        r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
      
      soup = BeautifulSoup(r.text, 'html.parser')
      f = soup.find_all('td', class_="table-main__tt")

      for tag in f: 
          match = fuzz.ratio(f'{b} - {c}', tag.find('a').text)
          hour = a.split(" ")[4]
          if hour.split(':')[0] == '23':
              act_hour = '00' + ':' + hour.split(':')[1]
          else:
              act_hour = str(int(hour.split(':')[0]) + 1) + ':' + hour.split(':')[1]
          if match > 70 and act_hour == tag.find('span').text:
              href_id = tag.find('a')['href']

              table = get_odds(href_id)
              matches_odds.append(table)
          
      print(i, ' of ', len(dates))

PS： monthToNum function 只需將月份名稱替換為他的編號

Answer 1

首先，您使用輸入 i、a、b 和 c 制作循環體的 function。 然后，您創建一個multiprocessing.Pool並將此 function 與正確的 arguments（i、a、b、c）提交到池中。

import multiprocessing

df = pd.read_excel(path)

dates = df.Date
hometeams = df.HomeTeam
awayteams = df.AwayTeam

matches_odds = list()

def fetch(data):
    i, a, b, c = data
    try:
        r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
    except requests.exceptions.ConnectionError:
        sleep(10)
        r = requests.get(f'https://www.betexplorer.com/results/soccer/?year={a.split(" ")[3]}&month={monthToNum(a.split(" ")[2])}&day={a.split(" ")[1]}')
      
    soup = BeautifulSoup(r.text, 'html.parser')
    f = soup.find_all('td', class_="table-main__tt")

    for tag in f: 
        match = fuzz.ratio(f'{b} - {c}', tag.find('a').text)
        hour = a.split(" ")[4]
        if hour.split(':')[0] == '23':
            act_hour = '00' + ':' + hour.split(':')[1]
        else:
            act_hour = str(int(hour.split(':')[0]) + 1) + ':' + hour.split(':')[1]
        if match > 70 and act_hour == tag.find('span').text:
            href_id = tag.find('a')['href']

            table = get_odds(href_id)
            matches_odds.append(table)
          
    print(i, ' of ', len(dates))

num_processes = 20
with multiprocessing.Pool(num_processes) as pool:
      pool.map(fetch, enumerate(zip(dates, hometeams, awayteams)))

此外， multiprocessing並不是提高速度的唯一方法。 也可以使用異步編程，並且可能更適合這種情況，盡管multiprocessing也可以完成這項工作 - 只想提一下。

如果仔細閱讀Python 多處理文檔，那就很明顯了。

如何通過多處理加速此 python 腳本

問題描述

1 個解決方案

解決方案1
0 2022-01-13 12:21:11

如何通過多處理加速此 python 腳本

問題描述

1 個解決方案

解決方案1 0 2022-01-13 12:21:11

解決方案1
0 2022-01-13 12:21:11