[英]Can't open CSV file in pandas python
I ran the following script ( https://github.com/FXCMAPI/FXCMTickData/blob/master/TickData34.py ) and added the following lines at the end to download the files: 我运行了以下脚本( https://github.com/FXCMAPI/FXCMTickData/blob/master/TickData34.py ),并在末尾添加了以下几行以下载文件:
output_folder = '/Users/me/Documents/data/forex/'
target_folder = os.path.join(output_folder, symbol, year)
os.makedirs(target_folder, exist_ok=True)
with open(os.path.join(target_folder, str(i) + '.csv'), 'wb') as outfile:
outfile.write(data)
Then, I tried opening the file using pandas as follows: 然后,我尝试使用pandas打开文件,如下所示:
x = pd.read_csv('/Users/me/Documents/data/forex/EURUSD/2015/29.csv')
However, this is what I got: 但是,这就是我得到的:
In [3]: x.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2415632 entries, 0 to 2415631
Data columns (total 3 columns):
D float64
Unnamed: 1 float64
Unnamed: 2 float64
dtypes: float64(3)
memory usage: 55.3 MB
In [4]: x.dropna()
Out[4]:
Empty DataFrame
Columns: [D, Unnamed: 1, Unnamed: 2]
Index: []
Why is the dataframe empty? 为什么数据框为空?
If I open the file on TextEdit, the first few lines actually look like this: 如果我在TextEdit上打开文件,则前几行实际上是这样的:
DateTime,Bid,Ask
07/19/2015 21:00:15.469,1.083,1.08332
07/19/2015 21:00:16.949,1.08311,1.08332
07/19/2015 21:00:16.955,1.08311,1.08338
Apparently, every character in your data is followed by the null character \\x00
. 显然,数据中的每个字符后跟一个空字符
\\x00
。 Get rid of them, and things will work: 摆脱它们,一切都会起作用:
outfile.write(data.replace(b'\x00',b''))
Thank you for providing a very concrete and reproducible problem. 感谢您提供一个非常具体且可重现的问题。
I pasted your code and run them in windows and it indeed just read in 55MB of null values. 我粘贴了您的代码并在Windows中运行它们,实际上它只读取了55MB的空值。
But I think it is a problem of pandas not parsing the csv file correctly, not that it cannot open the csv file. 但是我认为这是熊猫无法正确解析csv文件的问题,而不是它无法打开csv文件。
However, after I tried all the encoding listed in this answer , it simply didn't yield, so might be something wrong with the file as well. 但是,在尝试了此答案中列出的所有编码后,它只是没有产生效果,因此文件也可能有问题。
How I eventually made it work is by opening it in excel and save as a different file, then pandas can parse it correctly. 我最终如何使它工作的方法是在excel中打开并将其另存为其他文件,然后熊猫可以正确解析它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.