简体   繁体   中英

pandas how to read this row?

data sample: program go wrong with the second data for it has 7 "," while normal data only have 6.

7558,1488,1738539,,,,1
7559,1489,1702292,,"(segment \"Pesnya, ili Kak velikij Luarsab khor organizovyval\")",8,1
7560,1489,2146930,1975,,21,1

It is from imdb dataset's cast_info table. ([IMDB][2] is from a database task named cardinality estimination.) Its sep is ",". But if there were some sep in string, pandas can't recognize them. The error log:

  File "\pytorch\lib\site-packages\pandas\io\parsers\readers.py", line 488, in _read
return parser.read(nrows)
  File "\pytorch\lib\site-packages\pandas\io\parsers\readers.py", line 1047, in read
index, columns, col_dict = self._engine.read(nrows)
  File "\pytorch\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 223, in read
chunks = self._reader.read_low_memory(nrows)
  File "pandas\_libs\parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas\_libs\parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
  pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 7559, saw 8

How can I solve it? [2]: https://www.imdb.com/interfaces/

Try this i think this should work.

import pandas as pd
pd.read_csv(data_path,sep = ",")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM