简体   繁体   English

Pandas read_fwf 似乎不尊重编码参数

[英]Pandas read_fwf doesn't seem to respect the encoding parameter

I wanted to try and use the memory_map parameter to see whether it improved the load time of a file.我想尝试使用memory_map参数来查看它是否改善了文件的加载时间。 (I don't really know what the parameter does, but thought I would give it a shot.) (我真的不知道参数的作用,但我想我会试一试。)

When I try to load the file I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte .当我尝试加载文件时,我收到错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte I tried setting the encoding parameter (see below) but it doesn't appear to work.我尝试设置编码参数(见下文),但它似乎不起作用。

Here is the code:这是代码:

import pandas as pd
fwf_widths  = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
               1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
               5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
               1,1,1,1,2,1,1,1,1,1,1,1,]
pd.read_fwf("MOVEOUTA.ALL.OUT1.txt",
            usecols=range(0,80, 2), 
            widths=fwf_widths,
            encoding='windows-1252',
            memory_map=True)

Am I doing something wrong, or should I raise an issue with pandas (I have version 1.01)?我做错了什么,还是我应该提出 pandas 的问题(我有 1.01 版)?

Edit:编辑:

I tried this as well, but continue to receive the same error:我也试过这个,但继续收到同样的错误:

with open("MOVEOUTA.ALL.OUT1.txt", mode='r',encoding='windows-1252', ) as f:
    df = pd.read_fwf(f,
                     usecols=range(0,80, 2), 
                     widths=fwf_widths,
                     memory_map=True)

I don't know whether pandas.read_fwf accepts parameter encoding :我不知道pandas.read_fwf是否接受参数encoding

 pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)

Read a table of fixed-width formatted lines into DataFrame.将固定宽度格式化行的表读入 DataFrame。

Also supports optionally iterating or breaking of the file into chunks.还支持可选地将文件迭代或分解成块。

Additional help can be found in theonline docs for IO Tools .更多帮助可以在IO Tools 的在线文档中找到

The following code snippet should do the job (pass in an instance of StringIO to the filepath_or_buffer parameter):以下代码片段应该完成这项工作(将StringIO的实例传递给filepath_or_buffer参数):

import pandas as pd
from io import StringIO

with open("MOVEOUTA.ALL.OUT1.txt", mode='r', encoding='windows-1252') as f:
    content = f.read()
 
fwf_widths  = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
               1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
               5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
               1,1,1,1,2,1,1,1,1,1,1,1,]
df = pd.read_fwf( StringIO( content),
            usecols=range(0,80, 2),       # ??? this param not tested
            widths=fwf_widths,
            memory_map=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM