Pandas 无法在一夜之间打开 csv 文件

Question

I am building a script that generates a csv file with pandas, but i'm trying to make it so if the file already exists in the designated path, the script will only append the new info into the existing file, while if the file doesn't exist, it will create a new one.我正在构建一个脚本，该脚本使用 pandas 生成 csv 文件，但我正在尝试这样做，因此如果指定路径中已经存在该文件，则该脚本将仅 append '不存在，它将创建一个新的。

My code right now is like this:我现在的代码是这样的：

#Trying to find the file in the designated path and then appending the new info

try:
   pd.read_csv('file.csv', encoding = 'ANSI')
   info.to_csv('file.csv', index = False, header = None, sep = ';', encoding = 'ANSI', decimal = ',', mode = 'a')

#Creating a new file into the existing path if it doesn't exist

except IOError:
   info.to_csv('file.csv',  index = False, sep = ';', encoding = 'ANSI', decimal = ',', header = True)

This works fine during the day, but when I try to run the script the next day, the pd.read_csv() encounters the following error:这在白天工作正常，但是当我第二天尝试运行脚本时， pd.read_csv()遇到以下错误：

ParserError: Error tokenizing data. ParserError：错误标记数据。 C error: Expected 1 fields in line 10, saw 2 C 错误：预期第 10 行中的 1 个字段，看到 2

I've read that one way to resolve this is by adding the parameter error_bad_lines = False , but this results in an exponentially slower computational time.我已经读过解决此问题的一种方法是添加参数error_bad_lines = False ，但这会导致计算时间呈指数级增长。 The verbose in this case shows that almost every line of the.csv file has some extra \n in it.本例中的详细说明表明，.csv 文件的几乎每一行都有一些额外的\n 。

Is there some other way to tackle this file appending/creating problem?还有其他方法可以解决此文件附加/创建问题吗？

I need to be able to open this file in excel to check the infos inside.我需要能够在 excel 中打开这个文件来检查里面的信息。 Should I just try to create an.xlsx file instead of a.csv?我应该尝试创建一个.xlsx 文件而不是a.csv 吗？

Answer 1

Use pathlib to check if the file already exists .使用pathlib检查文件是否已经存在。 If you really want to load the whole file (only to check if it exists) you would need to use the same parameters as in the pd.to_csv settings.如果您真的想加载整个文件（仅检查它是否存在），您需要使用与pd.to_csv设置中相同的参数。 But the latter is computationally very expensive for a simple check so I would recommend to use pathlib .但是后者对于简单的检查来说在计算上非常昂贵，所以我建议使用pathlib 。

The default separator in pd.read_csv is , (which is your decimal separator) so the error is probably from trying to read the file with the wrong delimiter (more here ). pd.read_csv中的默认分隔符是, （这是您的小数分隔符），因此错误可能是由于尝试使用错误的分隔符读取文件（更多here ）。

Possible solution:可能的解决方案：

from pathlib import Path

if Path('file.csv').exists():
    info.to_csv('file.csv', index = False, header = None, sep = ';', encoding = 'ANSI', decimal = ',', mode = 'a')
else:
    info.to_csv('file.csv',  index = False, sep = ';', encoding = 'ANSI', decimal = ',', header = True)

Pandas 无法在一夜之间打开 csv 文件

问题描述

1 个解决方案

解决方案1
1 2021-02-25 14:08:38

Pandas 无法在一夜之间打开 csv 文件

问题描述

1 个解决方案

解决方案1 1 2021-02-25 14:08:38

解决方案1
1 2021-02-25 14:08:38