简体   繁体   English

pandas.read_csv 忽略了字符串的引号

[英]pandas.read_csv is ignoring quoting of strings

I am having some trouble reading/importing a csv file into a pandas dataframe. The import is not skipping the comma that is enclosed in quotes.我在将 csv 文件读取/导入到 pandas dataframe 时遇到一些问题。导入不会跳过引号中的逗号。

I have tried different options for quotechar but none made any difference我为 quotechar 尝试了不同的选项,但没有任何区别

import csv
import pandas

df = pandas.read_csv( 'test_quote.csv', header=None,sep=',', quotechar='\"', quoting=csv.QUOTE_MINIMAL, encoding='ascii', engine='python')
print(df)
code output 
$ python3 test_quote.py 
        0     1              2       3                            4       5       6
0  201571  2080    "December 2   2022"    "November 1 - November 30   2022"  487.29
1  345741  5377    "December 3   2022"    "November 1 - November 30   2022"  729.35
2  995349  3672   "December 2    2022"   "November 1 - November 30    2022"  937.33
3  475601  3672   "December 2    2022"   "November 1 - November 30    2022"  790.17
4  228548  3672    "December 7   2022"    "November 1 - November 30   2022"  682.38

expected output
$ python3 test_quote.py 
        0     1                     2                                   3       4
0  201571  2080    "December 2, 2022"    "November 1 - November 30, 2022"  487.29
1  345741  5377    "December 3, 2022"    "November 1 - November 30, 2022"  729.35
2  995349  3672   "December 2 , 2022"   "November 1 - November 30 , 2022"  937.33
3  475601  3672   "December 2 , 2022"   "November 1 - November 30 , 2022"  790.17
4  228548  3672    "December 7, 2022"    "November 1 - November 30, 2022"  682.38

input file = test_quote.csv
201571, 2080, "December 2, 2022", "November 1 - November 30, 2022", 487.29
345741, 5377, "December 3, 2022", "November 1 - November 30, 2022", 729.35
995349, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 937.33
475601, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 790.17
228548, 3672, "December 7, 2022", "November 1 - November 30, 2022", 682.38

The extra spaces after the commas are causing the issue.逗号后的额外空格导致了问题。 Use the following, but note most of your parameters are already the defaults.使用以下内容,但请注意您的大部分参数已经是默认值。

import csv
import pandas 

df = pandas.read_csv( 'test_quote.csv', header=None, skipinitialspace=True)
print(df)

Output: Output:

        0     1                  2                                3       4
0  201571  2080   December 2, 2022   November 1 - November 30, 2022  487.29
1  345741  5377   December 3, 2022   November 1 - November 30, 2022  729.35
2  995349  3672  December 2 , 2022  November 1 - November 30 , 2022  937.33
3  475601  3672  December 2 , 2022  November 1 - November 30 , 2022  790.17
4  228548  3672   December 7, 2022   November 1 - November 30, 2022  682.38

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM