简体   繁体   English

当文件名具有重音时,使用pandas.read_csv进行编码

[英]Encoding with pandas.read_csv when file name has accents

I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. 我正在尝试使用pandas加载CSV,但如果文件名有重音符,则会遇到问题。 It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly. 这显然是一个编码问题,但是虽然read_csv允许您为文件中的文本设置编码,但我无法弄清楚如何正确编码文件名。

input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)

The CSV file is Anzoátegui.csv. CSV文件是Anzoátegui.csv。 When I'm getting errors, 当我收到错误时,

input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv

Error code: 错误代码:

OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

So maybe it's converting my string to bytes? 那么也许它将我的字符串转换为字节? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame : 我也尝试使用io.StringIO(input_file) ,它将正确的文件名作为列标题放在空的DataFrame

Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv]
Index: []

Any ideas on how to get this file to load? 有关如何加载此文件的任何想法? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). 不幸的是,我不能只删除重音,因为我必须与需要正确名称的软件接口,而且我有大量文件要格式化(不仅仅是一个)。 Thanks! 谢谢!

Edit: Full error 编辑:完整错误

Traceback (most recent call last):
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
    result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
  File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
    result = eval(compiled, updated_globals, frame.f_locals)
  File "<string>", line 1, in <module>
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
    self._make_engine(self.engine)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
  File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. 好的伙计们,我在依赖性地狱中有点迷失,但事实证明这个问题已在pandas 0.14.0中得到修复。 Install the updated version to get files named with accents to import correctly. 安装更新版本以正确导入以重音命名的文件。

Comments at github . 评论在github

Thanks for the input! 感谢您的投入!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM