简体   繁体   English

使用熊猫读取 sas 文件时出现 ValueError

[英]ValueError when reading a sas file with pandas

pandas.read_sas() prints traceback messages that I cannot remove. pandas.read_sas()打印我无法删除的回溯消息。 The problem is it prints messages for EACH row it's reading, so when I try to read the whole file it just freezes printing too much.问题是它为正在读取的每一行打印消息,所以当我尝试读取整个文件时,它只会冻结打印太多。

I tried from other stackoverflow answers我从其他stackoverflow答案中尝试过

import warnings
warnings.simplefilter(action='ignore')

And

warnings.filterwarnings('ignore')

And

from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a 
href="javascript:code_toggle_err()">here</a>.''')

But nothing works.但没有任何效果。

The message it prints is:它打印的消息是:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) pandas\io\sas\sas.pyx in pandas.io.sas._sas.rle_decompress() -------------------------------------------------- ------------------------- ValueError Traceback (最近一次调用最后) pandas\io\sas\sas.pyx in pandas.io.sas._sas .rle_decompress()

ValueError: Unexpected non-zero end_of_first_byte ValueError:意外的非零 end_of_first_byte

Exception ignored in: 'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback (most recent call last): File "pandas\io\sas\sas.pyx", line 29, in pandas.io.sas._sas.rle_decompress ValueError: Unexpected non-zero end_of_first_byte异常被忽略:'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback(最近一次调用最后):文件“pandas\io\sas\sas.pyx”,第 29 行,在 pandas.io.sas._sas。 rle_decompress ValueError:意外的非零 end_of_first_byte

As highlighted in the traceback, the error is caused by a bug in the pandas implementation of RLE decompression, which is used when the SAS dataset is exported using CHAR (RLE) compression.正如回溯中突出显示的那样,该错误是由pandas RLE 解压缩实现中的错误引起的,该错误在使用 CHAR (RLE) 压缩导出 SAS 数据集时使用。

Note the pandas issue created for this topic: https://github.com/pandas-dev/pandas/issues/31243请注意为此主题创建的pandas问题: https ://github.com/pandas-dev/pandas/issues/31243

The resolution that pandas implemented for this bug in read_sas is contained in the following Pull Request, which is part of the version 1.5 milestone, yet to be released at the time of answering: https://github.com/pandas-dev/pandas/pull/47113 pandas针对read_sas中的这个 bug 实现的解决方案包含在以下 Pull Request 中,它是 1.5 版里程碑的一部分,在回答时尚未发布: https ://github.com/pandas-dev/pandas /拉/47113

To answer your question, you have two options:要回答您的问题,您有两种选择:

  1. Wait until pandas releases version 1.5, update to that version, and read_sas should then work as expected.等到pandas发布 1.5 版,更新到该版本,然后read_sas应该可以按预期工作。 You've already been waiting awhile since you asked, so I suspect this will be fine.自从你问起你已经等了一段时间了,所以我怀疑这会没事的。
  2. Use the python sas7bdat library instead ( https://pypi.org/project/sas7bdat/ ), and then convert to a pandas DataFrame:请改用 python sas7bdat库 ( https://pypi.org/project/sas7bdat/ ),然后转换为pandas DataFrame:
    from sas7bdat import SAS7BDAT
    df = SAS7BDAT("./path/to/file.sas7bdat").to_data_frame()

The sas7bdat approach worked for me, after facing the exact same error as you did.在遇到与您完全相同的错误之后, sas7bdat方法对我有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM