How can I figure out why my Jupyter kernel is dying?

Question

I've written a script in a Jupyter Notebook to read model specs from an Excel file (what should be predicted, from what variables, with what filter logic), run a series of models using xgboost, and write the results to Excel. With several different datasets, it works great, but with one specific dataset, every time I try to run it, I get the message:

The kernel appears to have died. It will restart automatically.

I tried running the same script in Spyder, and I got this message instead:

Assertion failed! 
File: pyreadstat/_readstat_parser.c, Line 12686 Expression: 
!PyErr_Occurred()

pyreadstat is a module I was using to read in variable labels from an SPSS data file, so this made me believe there might be an issue with using pyreadstat.

Going back to my original script in the Jupyter Notebook, I commented out anything to do with pyreadstat, and I still get my kernel death message. However, I can still run my script without issue with other data files & model specs -- with or without pyreadstat commented out.

The fact that this code has worked for ~5 different configurations but not this 1 makes me think that there's something different about this file, but I'm not sure how to identify what it is, especially since the Spyder warning made me think it was pyreadstat, but removing pyreadstat results in the issue continuing, in both Jupyter & Spyder (in Spyder, it just gets stuck connecting to a kernel once I comment out pyreadstat).

EDIT: I also tried adding in some print statements to see where the kernel dies, but it dies before anything has been run -- a print statement at the very beginning of the function I'm calling doesn't even execute.

EDIT: I've just figured out that the issue has to do with pandas.read_spss. I ran each statement in my function on its own, and this line was the one where the kernel died:

df = pd.read_spss("C:/Users/me/Desktop/file.sav", convert_categoricals=False)

Looking at the SPSS file itself, one of the variables was a date. None of the variables in my other data files were dates, so I changed its format to string, and when I did that, instead of the kernel dying, I got an error. Interestingly, it looks like pandas.read_spss uses pyreadstat:

ReadstatError                             Traceback (most recent call last)
<ipython-input-3-e8f001d12898> in <module>
      1 import pandas as pd
      2 
----> 3 df = pd.read_spss(fileloc, convert_categoricals=False)

~\Anaconda3\lib\site-packages\pandas\io\spss.py in read_spss(path, usecols, convert_categoricals)
     41 
     42     df, _ = pyreadstat.read_sav(
---> 43         path, usecols=usecols, apply_value_formats=convert_categoricals
     44     )
     45     return df

pyreadstat\pyreadstat.pyx in pyreadstat.pyreadstat.read_sav()

pyreadstat\_readstat_parser.pyx in pyreadstat._readstat_parser.run_conversion()

pyreadstat\_readstat_parser.pyx in pyreadstat._readstat_parser.run_readstat_parser()

pyreadstat\_readstat_parser.pyx in pyreadstat._readstat_parser.check_exit_status()

ReadstatError: Unable to open file

But now I'm somewhat at a loss, because I can't find anything else different about this file vs. the other SPSS files now that the date is a string.

Answer 1

I have found the answer, and it's obscure!

There was a variable in this SPSS file with 290 value labels. If I remove that variable in the SPSS file before bringing it into Python, everything runs as it is supposed to.

The date formatted variables mentioned in my question edit did not turn out to be an issue at all and import fine once I remove the variable with a lot of value labels.

How can I figure out why my Jupyter kernel is dying?

Question

1 answers

solution1
0 ACCPTED 2020-07-28 15:46:28

How can I figure out why my Jupyter kernel is dying?

Question

1 answers

solution1 0 ACCPTED 2020-07-28 15:46:28

solution1
0 ACCPTED 2020-07-28 15:46:28