Pandas 導入 CSV 和 Excel 文件錯誤

Question

我正在嘗試使用 Python Pandas 導入 CSV 文件。 該文件中的示例數據如下，其中第一行是用逗號分隔的列名。

End Customer Organization ID,End Customer Organization Name,End Customer Top Parent Organization ID,End Customer Top Parent Organization Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales Date 
11027676,Baroda Western Uttar Pradesh Gramin Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,"Hcl Infosystems Ltd - Partnerdghftrutyhb frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw",Server & CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server & CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,"Open Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho",125.85,1,FY07,12/28/2006
12835756,Uttam Strips Pvt Ltd,12835756,Uttam Strips Pvt Ltd,12565538,Redington C/O Fortis Financial Services Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc. Def,0,0,FY09,9/15/2008
12233135,Bhagwan Singh Tondon,12233135,Bhagwan Singh Tondon,2652941,H B S Systems Pvt Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA - New,0,0,FY09,9/15/2008
11602305,Maya Academy Of Advanced Cinematics,9750934,Maya Entertainment Ltd,336146,Embee Software Pvt Ltd,Server & CAL,Windows Server & CAL,Windows Server HPC,Windows Compute Cluster Server,Non-specific,Open,Open V/MYO - Rec,OLV Perpet L&SA Recur-Def,0,0,FY09,9/25/2008
13336009,Remiel Softech Solution Pvt Ltd,13336009,Remiel Softech Solution Pvt Ltd,13335482,Redington C/O Remiel Softech Solutions Pvt Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc. Def,0,0,FY09,12/23/2008

我正在使用以下代碼導入：

import pandas as pd

df=pd.read_csv('file path.csv',sep=',')

它給出了以下錯誤：

Traceback (most recent call last):
  File "<pyshell#25>", line 1, in <module>
    df=pd.read_csv(filename,sep=',')
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
    return parser.read()
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in read
    ret = self._engine.read(nrows)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in read
    data = self._reader.read(nrows)
  File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas\parser.c:6745)
  File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:6964)
  File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas\parser.c:7780)
  File "parser.pyx", line 890, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:8793)
  File "parser.pyx", line 950, in pandas.parser.TextReader._convert_tokens (pandas\parser.c:9484)
  File "parser.pyx", line 1026, in pandas.parser.TextReader._convert_with_dtype (pandas\parser.c:10642)
  File "parser.pyx", line 1046, in pandas.parser.TextReader._string_convert (pandas\parser.c:10853)
  File "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 90: invalid start byte

由於它看起來像一個 Unicode 錯誤，我這次運行的編碼改變了：

df=pd.read_csv(filename,encoding='utf-16',sep=',')

它給出了以下錯誤：

Traceback (most recent call last):
  File "<pyshell#26>", line 1, in <module>
    df=pd.read_csv(filename,encoding='utf-16',sep=',')
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 198, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
    self._make_engine(self.engine)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 586, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 957, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 477, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4434)
  File "parser.pyx", line 592, in pandas.parser.TextReader._get_header (pandas\parser.c:5660)
  File "parser.pyx", line 768, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:7451)
  File "parser.pyx", line 1661, in pandas.parser.raise_parser_error (pandas\parser.c:18744)
pandas.parser.CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

不知道為什么會這樣？ 甚至嘗試將 CSV 文件轉換為帶有文本到列的 Excel 並使用 Pandas 的 read_excel 函數。 這也給出了錯誤（如下）：

Traceback (most recent call last):
  File "<pyshell#30>", line 1, in <module>
    df=pd.read_excel('J:\dmqp on 192.168.1.41\MS Sales Dump (FY09)xls','MS Sales Dump (FY09)')
  File "C:\Python33\lib\site-packages\pandas\io\excel.py", line 52, in read_excel
    return ExcelFile(path_or_buf,kind=kind).parse(sheetname=sheetname,
  File "C:\Python33\lib\site-packages\pandas\io\excel.py", line 68, in __init__
    import xlrd # throw an ImportError if we need to
ImportError: No module named 'xlrd'

有人可以幫助解決上述錯誤以及導入為 CSV 和 Excel 時出現的問題。

我嘗試更改編碼后使用此代碼：

df=pd.read_csv(filename,encoding='iso-8859-1',sep=',')

它沒有給出任何錯誤，而是作為一列導入而不是將其分解為單獨的列。

>>>df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 263244 entries, 0 to 263243
Data columns (total 1 columns):
End Customer Organization ID,End Customer Organization Name,End Customer Top Parent Organization ID,End Customer Top Parent Organization Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales Date    263244  non-null values
dtypes: object(1)

通過將上面的示例數據存儲在一個文本文件中然后導入它來檢查上面的示例數據后，這是我得到的輸出：

>>> df =pd.read_csv(r'J:\Data.txt')
>>> print(df)
   End Customer Organization ID  \
0                      11027676   
1                      12835756   
2                      12233135   
3                      11602305   
4                      13336009   

                      End Customer Organization Name  \
0  Baroda Western Uttar Pradesh Gramin Bankgfhgfn...   
1                               Uttam Strips Pvt Ltd   
2                               Bhagwan Singh Tondon   
3                Maya Academy Of Advanced Cinematics   
4                    Remiel Softech Solution Pvt Ltd   

   End Customer Top Parent Organization ID  \
0                                  4078446   
1                                 12835756   
2                                 12233135   
3                                  9750934   
4                                 13336009   

           End Customer Top Parent Organization Name  Reseller Top Parent ID  \
0  Bank Of Barodadfhhgfjyjtkyukujkyujkuhykluiluil...                 1809012   
1                               Uttam Strips Pvt Ltd                12565538   
2                               Bhagwan Singh Tondon                 2652941   
3                             Maya Entertainment Ltd                  336146   
4                    Remiel Softech Solution Pvt Ltd                13335482   

                            Reseller Top Parent Name  \
0  Hcl Infosystems Ltd - Partnerdghftrutyhb frhyw...   
1        Redington C/O Fortis Financial Services Ltd   
2                              H B S Systems Pvt Ltd   
3                             Embee Software Pvt Ltd   
4     Redington C/O Remiel Softech Solutions Pvt Ltd   

                                            Business  \
0  Server & CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmgh...   
1                                                MBS   
2                                       Server & CAL   
3                                       Server & CAL   
4                                                MBS   

                                    Rev Sum Division  \
0  SQL Server & CALdfhtrhtrgbhrghrye5y45y45yu56ju...   
1                                       Dynamics ERP   
2                                   SQL Server & CAL   
3                               Windows Server & CAL   
4                                       Dynamics ERP   

                                    Rev Sum Category  \
0  SQL CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfa...   
1                                       Dynamics NAV   
2                                            SQL CAL   
3                                 Windows Server HPC   
4                                       Dynamics NAV   

                                      Product Family       Version  \
0  SQL CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmj...          2005   
1                   Dynamics NAV Business Essentials  Non-specific   
2                                            SQL CAL  Non-specific   
3                     Windows Compute Cluster Server  Non-specific   
4                   Dynamics NAV Business Essentials  Non-specific   

                                       Pricing Level  \
0  Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasd...   
1                                              Other   
2                                               Open   
3                                               Open   
4                                              Other   

                               Summary Pricing Level  \
0  Open Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgf...   
1                                             MBS SA   
2                                          Open L&SA   
3                                   Open V/MYO - Rec   
4                                             MBS SA   

                                Detail Pricing Level  MS Sales Amount  \
0  Open Stddfm,vdnoghioerivnsdflierohgushdfovhsio...           125.85   
1                       MBS New Customer Enhanc. Def             0.00   
2                           Deferred Open L&SA - New             0.00   
3                          OLV Perpet L&SA Recur-Def             0.00   
4                       MBS New Customer Enhanc. Def             0.00   

   MS Sales Licenses Fiscal Year Sales Date   
0                  1        FY07  12/28/2006  
1                  0        FY09   9/15/2008  
2                  0        FY09   9/15/2008  
3                  0        FY09   9/25/2008  
4                  0        FY09  12/23/2008  
>>>

這是在每一列之后添加 '\\' 並且列名不是一個接一個。 相反，它們似乎在每列導入后都在新行上。

Answer 1

我想你的主要問題與編碼有關。 我已經忍受了處理 csv 文件中奇怪編碼的痛苦。 在這些情況下，幫助我的是嘗試檢測文件的真實編碼並使用 Pandas 正確加載它。

試試這個下一個代碼：

from chardet.universaldetector import UniversalDetector

def test_encoding(file_name):
    detector = UniversalDetector()
    with open(file_name, 'rb') as f:
        for line in f:
            detector.feed(line)
            if detector.done:
                 break
        detector.close()
    r = detector.result
    return "Detected encoding %s with confidence %s" % (r['encoding'], r['confidence'])

# pass the file path in the function to see result
test_encoding('C:\Users\..\file.csv')

輸出：

'Detected encoding UTF-16 with confidence 1.0'

這將嘗試推斷您的文件的編碼，然后您可以嘗試使用 Pandas 正確加載它。 希望能幫助到你...

Pandas 導入 CSV 和 Excel 文件錯誤

問題描述

1 個解決方案

解決方案1
3 2016-05-06 13:29:31

Pandas 導入 CSV 和 Excel 文件錯誤

問題描述

1 個解決方案

解決方案1 3 2016-05-06 13:29:31

解決方案1
3 2016-05-06 13:29:31