Pandas read_fwf 似乎不尊重编码参数

Question

I wanted to try and use the memory_map parameter to see whether it improved the load time of a file.我想尝试使用memory_map参数来查看它是否改善了文件的加载时间。 (I don't really know what the parameter does, but thought I would give it a shot.) （我真的不知道参数的作用，但我想我会试一试。）

When I try to load the file I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte .当我尝试加载文件时，我收到错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte 。 I tried setting the encoding parameter (see below) but it doesn't appear to work.我尝试设置编码参数（见下文），但它似乎不起作用。

Here is the code:这是代码：

import pandas as pd
fwf_widths  = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
               1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
               5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
               1,1,1,1,2,1,1,1,1,1,1,1,]
pd.read_fwf("MOVEOUTA.ALL.OUT1.txt",
            usecols=range(0,80, 2), 
            widths=fwf_widths,
            encoding='windows-1252',
            memory_map=True)

Am I doing something wrong, or should I raise an issue with pandas (I have version 1.01)?我做错了什么，还是我应该提出 pandas 的问题（我有 1.01 版）？

Edit:编辑：

I tried this as well, but continue to receive the same error:我也试过这个，但继续收到同样的错误：

with open("MOVEOUTA.ALL.OUT1.txt", mode='r',encoding='windows-1252', ) as f:
    df = pd.read_fwf(f,
                     usecols=range(0,80, 2), 
                     widths=fwf_widths,
                     memory_map=True)

Answer 1

I don't know whether pandas.read_fwf accepts parameter encoding :我不知道pandas.read_fwf是否接受参数encoding ：

 pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)
Read a table of fixed-width formatted lines into DataFrame.将固定宽度格式化行的表读入 DataFrame。

Also supports optionally iterating or breaking of the file into chunks.还支持可选地将文件迭代或分解成块。

Additional help can be found in theonline docs for IO Tools .更多帮助可以在IO Tools 的在线文档中找到。

The following code snippet should do the job (pass in an instance of StringIO to the filepath_or_buffer parameter):以下代码片段应该完成这项工作（将StringIO的实例传递给filepath_or_buffer参数）：

import pandas as pd
from io import StringIO

with open("MOVEOUTA.ALL.OUT1.txt", mode='r', encoding='windows-1252') as f:
    content = f.read()
 
fwf_widths  = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
               1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
               5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
               1,1,1,1,2,1,1,1,1,1,1,1,]
df = pd.read_fwf( StringIO( content),
            usecols=range(0,80, 2),       # ??? this param not tested
            widths=fwf_widths,
            memory_map=True)

Pandas read_fwf 似乎不尊重编码参数

问题描述

1 个解决方案

解决方案1
0 2021-05-26 18:03:13

Pandas read_fwf 似乎不尊重编码参数

问题描述

1 个解决方案

解决方案1 0 2021-05-26 18:03:13

解决方案1
0 2021-05-26 18:03:13