简体   繁体   中英

Pandas read_sas error: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)

I am using Pandas 0.18 to open a sas7bdat dataset

I simply use:

df=pd.read_sas(P:/myfile.sas7bdat)

and I get the following error

    buf[0:text_block_size].rstrip(b"\x00 ").decode())

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)

If I use

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

I get

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd8 in position 0: invalid continuation byte

Other sas7bdat files in my folder are handled just fine by Pandas.

When I open the file in SAS I see that the column names are very long and span several lines, but otherwise the files look just fine.

There are not so many possible options in read_sas ... what to do? I Many thanks!

You probably have to set the encoding to UTF-8. Something like this (according to the docs ):

df=pd.read_sas(P:/myfile.sas7bdat, encoding='utf-8')

I have the same problem.

The issue is I have encoding='utf-8'

I still get the below error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-20-5deb45266124> in <module>
----> 1 df = pd.read_sas("/workspace/em_data1/dev/sas_data/bureau/data_validation/dnb/freq_202008/_freq_2138_201503_202009.sas7bdat",encoding='utf-8')

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sasreader.py in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
    121 
    122         reader = SAS7BDATReader(
--> 123             filepath_or_buffer, index=index, encoding=encoding, chunksize=chunksize
    124         )
    125     else:

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in __init__(self, path_or_buf, index, convert_dates, blank_missing, chunksize, encoding, convert_text, convert_header_text)
    144 
    145         self._get_properties()
--> 146         self._parse_metadata()
    147 
    148     def column_data_lengths(self):

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _parse_metadata(self)
    349                 self.close()
    350                 raise ValueError("Failed to read a meta data page from the SAS file.")
--> 351             done = self._process_page_meta()
    352 
    353     def _process_page_meta(self):

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_page_meta(self)
    355         pt = [const.page_meta_type, const.page_amd_type] + const.page_mix_types
    356         if self._current_page_type in pt:
--> 357             self._process_page_metadata()
    358         is_data_page = self._current_page_type & const.page_data_type
    359         is_mix_page = self._current_page_type in const.page_mix_types

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_page_metadata(self)
    390                 subheader_signature, pointer.compression, pointer.ptype
    391             )
--> 392             self._process_subheader(subheader_index, pointer)
    393 
    394     def _get_subheader_index(self, signature, compression, ptype):

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_subheader(self, subheader_index, pointer)
    458             raise ValueError("unknown subheader index")
    459 
--> 460         processor(offset, length)
    461 
    462     def _process_rowsize_subheader(self, offset, length):

/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_columntext_subheader(self, offset, length)
    512         cname = cname_raw
    513         if self.convert_header_text:
--> 514             cname = cname.decode(self.encoding or self.default_encoding)
    515         self.column_names_strings.append(cname)
    516 

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 0: invalid continuation byte

From my unix shell, I have this:

echo $LANG
en_US.UTF-8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM