I am using Pandas 0.18 to open a sas7bdat
dataset
I simply use:
df=pd.read_sas(P:/myfile.sas7bdat)
and I get the following error
buf[0:text_block_size].rstrip(b"\x00 ").decode())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)
If I use
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
I get
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd8 in position 0: invalid continuation byte
Other sas7bdat
files in my folder are handled just fine by Pandas.
When I open the file in SAS I see that the column names are very long and span several lines, but otherwise the files look just fine.
There are not so many possible options in read_sas
... what to do? I Many thanks!
You probably have to set the encoding to UTF-8. Something like this (according to the docs ):
df=pd.read_sas(P:/myfile.sas7bdat, encoding='utf-8')
I have the same problem.
The issue is I have encoding='utf-8'
I still get the below error:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-20-5deb45266124> in <module>
----> 1 df = pd.read_sas("/workspace/em_data1/dev/sas_data/bureau/data_validation/dnb/freq_202008/_freq_2138_201503_202009.sas7bdat",encoding='utf-8')
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sasreader.py in read_sas(filepath_or_buffer, format, index, encoding, chunksize, iterator)
121
122 reader = SAS7BDATReader(
--> 123 filepath_or_buffer, index=index, encoding=encoding, chunksize=chunksize
124 )
125 else:
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in __init__(self, path_or_buf, index, convert_dates, blank_missing, chunksize, encoding, convert_text, convert_header_text)
144
145 self._get_properties()
--> 146 self._parse_metadata()
147
148 def column_data_lengths(self):
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _parse_metadata(self)
349 self.close()
350 raise ValueError("Failed to read a meta data page from the SAS file.")
--> 351 done = self._process_page_meta()
352
353 def _process_page_meta(self):
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_page_meta(self)
355 pt = [const.page_meta_type, const.page_amd_type] + const.page_mix_types
356 if self._current_page_type in pt:
--> 357 self._process_page_metadata()
358 is_data_page = self._current_page_type & const.page_data_type
359 is_mix_page = self._current_page_type in const.page_mix_types
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_page_metadata(self)
390 subheader_signature, pointer.compression, pointer.ptype
391 )
--> 392 self._process_subheader(subheader_index, pointer)
393
394 def _get_subheader_index(self, signature, compression, ptype):
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_subheader(self, subheader_index, pointer)
458 raise ValueError("unknown subheader index")
459
--> 460 processor(offset, length)
461
462 def _process_rowsize_subheader(self, offset, length):
/opt/Anaconda/2018.12/lib/python3.7/site-packages/pandas/io/sas/sas7bdat.py in _process_columntext_subheader(self, offset, length)
512 cname = cname_raw
513 if self.convert_header_text:
--> 514 cname = cname.decode(self.encoding or self.default_encoding)
515 self.column_names_strings.append(cname)
516
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 0: invalid continuation byte
From my unix shell, I have this:
echo $LANG
en_US.UTF-8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.