pandas.read_sas() prints traceback messages that I cannot remove. The problem is it prints messages for EACH row it's reading, so when I try to read the whole file it just freezes printing too much.
I tried from other stackoverflow answers
import warnings
warnings.simplefilter(action='ignore')
And
warnings.filterwarnings('ignore')
And
from IPython.display import HTML
HTML('''<script>
code_show_err=false;
function code_toggle_err() {
if (code_show_err){
$('div.output_stderr').hide();
} else {
$('div.output_stderr').show();
}
code_show_err = !code_show_err
}
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a
href="javascript:code_toggle_err()">here</a>.''')
But nothing works.
The message it prints is:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas\io\sas\sas.pyx in pandas.io.sas._sas.rle_decompress()
ValueError: Unexpected non-zero end_of_first_byte
Exception ignored in: 'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback (most recent call last): File "pandas\io\sas\sas.pyx", line 29, in pandas.io.sas._sas.rle_decompress ValueError: Unexpected non-zero end_of_first_byte
As highlighted in the traceback, the error is caused by a bug in the pandas
implementation of RLE decompression, which is used when the SAS dataset is exported using CHAR (RLE) compression.
Note the pandas
issue created for this topic: https://github.com/pandas-dev/pandas/issues/31243
The resolution that pandas
implemented for this bug in read_sas
is contained in the following Pull Request, which is part of the version 1.5 milestone, yet to be released at the time of answering: https://github.com/pandas-dev/pandas/pull/47113
To answer your question, you have two options:
pandas
releases version 1.5, update to that version, and read_sas
should then work as expected. You've already been waiting awhile since you asked, so I suspect this will be fine.sas7bdat
library instead ( https://pypi.org/project/sas7bdat/ ), and then convert to a pandas
DataFrame: from sas7bdat import SAS7BDAT
df = SAS7BDAT("./path/to/file.sas7bdat").to_data_frame()
The sas7bdat
approach worked for me, after facing the exact same error as you did.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.