I am trying to scrape some box scores from ESPN.com and put them in Pandas DataFrame. I have done similar things in the past in the same manner with out any problems. However in this case I am getting this error when I try to save the DataFrame.
RuntimeError: maximum recursion depth exceeded while calling a Python object
I get a similar error when trying to save it as a hdf5 table.
Even this snippet gives the same error. I am pretty confused on why it is doing this? Is it something to do with the function?
url = 'http://espn.go.com/nba/boxscore?gameId=400278773'
boxurl = urllib2.urlopen(url).read()
soup = BeautifulSoup(boxurl)
tables = soup.findAll('table')
lineScoreTable = tables[-2]
lineScoreRows = lineScoreTable.findAll('tr')
def GetAwayQTRScores():
scoreRow = lineScoreRows[1].findAll('td')
awayQTRScores = []
for x in scoreRow:
scores = x.string
awayQTRScores.append(scores)
return awayQTRScores # returns list
awayQTRScores = GetAwayQTRScores()
awayTeam = awayQTRScores[0]
awayQ1 = int(awayQTRScores[1])
awayQ2 = int(awayQTRScores[2])
awayQ3 = int(awayQTRScores[3])
awayQ4 = int(awayQTRScores[4])
awayOT1 = np.nan if len(awayQTRScores) < 7 else int(awayQTRScores[5])
awayOT2 = np.nan if len(awayQTRScores) < 8 else int(awayQTRScores[6])
awayOT3 = np.nan if len(awayQTRScores) < 9 else int(awayQTRScores[7])
awayOT4 = np.nan if len(awayQTRScores) < 10 else int(awayQTRScores[8])
data = {'AwayTeam' :[awayTeam],
'AwayQ1' : [awayQ1],
'AwayQ2' : [awayQ2],
'AwayQ3' : [awayQ3],
'AwayQ4' : [awayQ4],
'AwayOT1' : [awayOT1],
'AwayOT2' : [awayOT2],
'AwayOT3' : [awayOT3],
'AwayOT4' : [awayOT4]}
testScrape = pd.DataFrame(data)
testScrape.save('testScrape')
RuntimeError Traceback (most recent call last) in () ----> 1 testScrape.save('testScrape')
C:\\Python27\\lib\\site-packages\\pandas\\core\\generic.pyc in save(self, path) 26 27 def save(self, path): ---> 28 com.save(self, path) 29 30 @classmethod
C:\\Python27\\lib\\site-packages\\pandas\\core\\common.pyc in save(obj, path) 1562 f = open(path, 'wb') 1563 try: -> 1564 pickle.dump(obj, f, protocol=pickle.HIGHEST_PROTOCOL) 1565 finally: 1566 f.close()
RuntimeError: maximum recursion depth exceeded while calling a Python object
print data
returns
{'AwayTeam': [u'LAL'], 'AwayOT4': [nan], 'AwayQ4': [27], 'AwayQ3': [36], 'AwayQ2': [24], 'AwayQ1': [16], 'AwayOT1': [nan], 'AwayOT2': [nan], 'AwayOT3': [nan]}
This exception from pickle.dump usually means that you're trying to pickle an object that contains itself (directly or indirectly).
But what object contains itself? When you print
them all out, they all look fine.
It's awayTeam
This is a bs4.element.NavigableString
, which you get by doing this:
awayTeam = awayQTRScores[0]
You may not notice it from just print awayTeam
or even print repr(awayTeam)
, because NavigableString
is a subclass of unicode
and doesn't define a custom __str__
or __repr__
, so it prints just like a string.
But it also doesn't define a custom pickler, so it uses the default pickler. In general, bs4
objects aren't designed to be pickled, and many of them can't be. In particular, NavigableString
is an object that indirectly contains itself. As the docs say:
If you want to use a
NavigableString
outside of Beautiful Soup, you should callunicode()
on it to turn it into a normal Python Unicode string. If you don't, your string will carry around a reference to the entire Beautiful Soup parse tree , even when you're done using Beautiful Soup.
And of course the parse tree contains a reference to the string, which etc. So, this type can never be pickled.
The solution is simple. You wanted a plain old unicode
string, not a NavigableString
, so you can just do this:
awayTeam = unicode(awayQTRScores[0])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.