[英]Pandas read_fwf doesn't seem to respect the encoding parameter
I wanted to try and use the memory_map
parameter to see whether it improved the load time of a file.我想尝试使用
memory_map
参数来查看它是否改善了文件的加载时间。 (I don't really know what the parameter does, but thought I would give it a shot.) (我真的不知道参数的作用,但我想我会试一试。)
When I try to load the file I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte
.当我尝试加载文件时,我收到错误
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte
。 I tried setting the encoding parameter (see below) but it doesn't appear to work.我尝试设置编码参数(见下文),但它似乎不起作用。
Here is the code:这是代码:
import pandas as pd
fwf_widths = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
1,1,1,1,2,1,1,1,1,1,1,1,]
pd.read_fwf("MOVEOUTA.ALL.OUT1.txt",
usecols=range(0,80, 2),
widths=fwf_widths,
encoding='windows-1252',
memory_map=True)
Am I doing something wrong, or should I raise an issue with pandas (I have version 1.01)?我做错了什么,还是我应该提出 pandas 的问题(我有 1.01 版)?
Edit:编辑:
I tried this as well, but continue to receive the same error:我也试过这个,但继续收到同样的错误:
with open("MOVEOUTA.ALL.OUT1.txt", mode='r',encoding='windows-1252', ) as f:
df = pd.read_fwf(f,
usecols=range(0,80, 2),
widths=fwf_widths,
memory_map=True)
I don't know whether pandas.read_fwf accepts parameter encoding
:我不知道pandas.read_fwf是否接受参数
encoding
:
pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)
Read a table of fixed-width formatted lines into DataFrame.
将固定宽度格式化行的表读入 DataFrame。
Also supports optionally iterating or breaking of the file into chunks.
还支持可选地将文件迭代或分解成块。
Additional help can be found in theonline docs for IO Tools .
更多帮助可以在IO Tools 的在线文档中找到。
The following code snippet should do the job (pass in an instance of StringIO
to the filepath_or_buffer
parameter):以下代码片段应该完成这项工作(将
StringIO
的实例传递给filepath_or_buffer
参数):
import pandas as pd
from io import StringIO
with open("MOVEOUTA.ALL.OUT1.txt", mode='r', encoding='windows-1252') as f:
content = f.read()
fwf_widths = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
1,1,1,1,2,1,1,1,1,1,1,1,]
df = pd.read_fwf( StringIO( content),
usecols=range(0,80, 2), # ??? this param not tested
widths=fwf_widths,
memory_map=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.