如何让我的 Python 代码执行得更快？

Question

I have written the following code to scrape data from a website (eg https://www.oddsportal.com/soccer/new-zealand/football-championship/hamilton-canterbury-GhUEDiE0/ ).我编写了以下代码来从网站上抓取数据（例如https://www.oddsportal.com/soccer/new-zealand/football-championship/hamilton-canterbury-GhUEDiE0/ ）。 The data in question are the over/under values that can be found in the pages HTML Code:有问题的数据是可以在页面 HTML 代码中找到的高/低值：

 <tr class="lo odd"> <td> <div class="l"><a class="name2" title="Go to Pinnacle website." onclick="return.window;open(this.href)" href="/bookmaker/pinnacle/link/"><span class="blogos l18"></span></a>&nbsp.<a class="name" title="Go to Pinnacle website;" onclick="return;window.open(this.href)" href="/bookmaker/pinnacle/link/">Pinnacle</a>&nbsp,&nbsp.</div><span class="ico-bookmarker-info ico-bookmaker-detail"><a title="Show more details about Pinnacle" href="/bookmaker/pinnacle/"></a></span></td> <td class="center">+0,5</td> <td class="right odds"> <div class=" deactivateOdd" onmouseout="delayHideTip()" onmouseover="page,hist(this,'P-0,50-0-0','4j5hgx1tkucx1ix0'.18.event,0.1)">1,10</div> </td> <td class="right odds up-dark"> <div class=" deactivateOdd" onmouseout="delayHideTip()" onmouseover="page,hist(this,'P-0,50-0-0','4j5hgx1tl1gx1ix0'.18,event,0,1)">7.85</div> </td> <td class="center info-value"><span>-</span></td> <td onmouseout="delayHideTip()" class="check ch1" xparam="The match has already started~2"></td> </tr>

The interesting part is the over/under values, for example here 1.10, 7.85.有趣的部分是高/低值，例如这里的 1.10、7.85。 This data is scraped and arranged in a data frame:这些数据被抓取并排列在一个数据框中：

    master_df= pd.DataFrame()

    for match in self.all_links:
    #for match in links:

        self.openmatch(match)
        self.clickou()
        self.expandodds()   
        for x in range(1,28):
            L = []
            bookmakers=['Asianodds','Pinnacle']

                #odds_type=fi2('//*[@id="odds-data-table"]/div{}/div/strong/a'.format(x))
            if x==1:
                over_under_type= 'Over/Under +0.5'
            elif x==4:
                over_under_type= 'Over/Under +1'
            elif x==6:
                over_under_type= 'Over/Under +1.5'
            elif x==8:
                over_under_type= 'Over/Under +1.75'
            elif x==9:
                over_under_type= 'Over/Under +2'  
            elif x==10:
                over_under_type= 'Over/Under +2.25'
            elif x==11:
                over_under_type= 'Over/Under +2.5'
            elif x==13:
                over_under_type= 'Over/Under +2.75'
            elif x==14:
                over_under_type= 'Over/Under +3' 
            elif x==16:
                over_under_type= 'Over/Under +3.5'  
            elif x==19:
                over_under_type= 'Over/Under +4'
            elif x==21:
                over_under_type= 'Over/Under +4.5'
            elif x==26:
                over_under_type= 'Over/Under +5.5'
            elif x==28:
                over_under_type= 'Over/Under +6.5' 

            for j in range(1,15): # only first 10 bookmakers displayed
                Book = self.ffi('//*[@id="odds-data-table"]/div[{}]/table/tbody/tr[{}]/td[1]/div/a[2]'.format(x,j)) # first bookmaker name
                Odd_1 = self.fffi('//*[@id="odds-data-table"]/div[{}]/table/tbody/tr[{}]/td[3]/div'.format(x,j)) # first home odd
                Odd_2 = self.fffi('//*[@id="odds-data-table"]/div[{}]/table/tbody/tr[{}]/td[4]/div'.format(x,j)) # first away odd
                match = self.ffi('//*[@id="col-content"]/h1') # match teams
                final_score = self.ffi('//*[@id="event-status"]')
                date = self.ffi('//*[@id="col-content"]/p[1]') # Date and time
                print(match, Book, Odd_1, Odd_2, date, final_score, link, over_under_type, '/ 500 ')
                L = L + [(match, Book, Odd_1, Odd_2, date, final_score, link, over_under_type)]
                data_df = pd.DataFrame(L)

                try:
                    data_df.columns = ['TeamsRaw', 'Bookmaker', 'Over', 'Under', 'DateRaw' ,'ScoreRaw','Link','Over Under Type']
                except:
                    print('Function crashed, probable reason : no games scraped (empty season)')
                master_df=pd.concat([master_df,data_df])

My issue is that with this code the execution takes me something like 5 minutes per iteration to execute.我的问题是，使用这段代码，每次迭代执行大约需要 5 分钟。 I am now trying to make the program more performant.我现在正试图让程序更高效。 I guess there might be a more elegant way to achieve this than having all those for loops?我想可能有一种比拥有所有 for 循环更优雅的方法来实现这一点？ I need them in order to get the correct "div" for each xpath.我需要它们才能为每个 xpath 获得正确的“div”。 I would be glad for some recommendations!我会很高兴有一些建议！

Answer 1

I would recommend profiling your code to see where the bottlenecks are.我建议分析您的代码以查看瓶颈在哪里。 cProfile is one I typically use. cProfile 是我通常使用的一种。

如何让我的 Python 代码执行得更快？

问题描述

1 个解决方案

解决方案1
0 2021-01-31 16:59:01

如何让我的 Python 代码执行得更快？

问题描述

1 个解决方案

解决方案1 0 2021-01-31 16:59:01

解决方案1
0 2021-01-31 16:59:01