简体   繁体   中英

ValueError when reading a sas file with pandas

pandas.read_sas() prints traceback messages that I cannot remove. The problem is it prints messages for EACH row it's reading, so when I try to read the whole file it just freezes printing too much.

I tried from other stackoverflow answers

import warnings
warnings.simplefilter(action='ignore')

And

warnings.filterwarnings('ignore')

And

from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a 
href="javascript:code_toggle_err()">here</a>.''')

But nothing works.

The message it prints is:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas\io\sas\sas.pyx in pandas.io.sas._sas.rle_decompress()

ValueError: Unexpected non-zero end_of_first_byte

Exception ignored in: 'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback (most recent call last): File "pandas\io\sas\sas.pyx", line 29, in pandas.io.sas._sas.rle_decompress ValueError: Unexpected non-zero end_of_first_byte

As highlighted in the traceback, the error is caused by a bug in the pandas implementation of RLE decompression, which is used when the SAS dataset is exported using CHAR (RLE) compression.

Note the pandas issue created for this topic: https://github.com/pandas-dev/pandas/issues/31243

The resolution that pandas implemented for this bug in read_sas is contained in the following Pull Request, which is part of the version 1.5 milestone, yet to be released at the time of answering: https://github.com/pandas-dev/pandas/pull/47113

To answer your question, you have two options:

  1. Wait until pandas releases version 1.5, update to that version, and read_sas should then work as expected. You've already been waiting awhile since you asked, so I suspect this will be fine.
  2. Use the python sas7bdat library instead ( https://pypi.org/project/sas7bdat/ ), and then convert to a pandas DataFrame:
    from sas7bdat import SAS7BDAT
    df = SAS7BDAT("./path/to/file.sas7bdat").to_data_frame()

The sas7bdat approach worked for me, after facing the exact same error as you did.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM