简体   繁体   English

Python Pandas read_csv如何加快处理时间戳

[英]python pandas read_csv how to speed up processing timestamps

(pandas 0.16.1, Python 2.7.8 Anaconda 2.1.0 (64-bit), Intel Xeon 3.07GHz, Win7 64bit) (熊猫0.16.1,Python 2.7.8 Anaconda 2.1.0(64位),Intel Xeon 3.07GHz,Win7 64bit)

I have a csv table of quote data. 我有一个csv报价数据表。 About 400k rows per day. 每天约40万行。

sym         time                    bid     ask     bsize asize
XCME@6EM4   2014.05.07T08:10:02.407 1.3927  1.3928  28    29
XCME@6EM4   2014.05.07T08:10:02.430 1.3927  1.3928  27    29

To read this into Python with pandas 使用Pandas将其读入Python

pd.read_csv("quotes.csv", parse_dates = {'idx':[1]}, index_col = 'idx')

takes about 40 seconds. 大约需要40秒。

Any idea if this can be made quicker? 你知道这可以更快吗? People have suggested Cython solutions in this post , but I wonder if a Python/pandas solution exists? 人们在这篇文章中提出了Cython解决方案,但是我想知道是否存在Python / pandas解决方案?

btw, this below does not parse the dates, a bug? 顺便说一句,这下面不解析日期,一个bug?

pd.read_csv("quotes.csv", parse_dates = [1])

Here is a better option 这是一个更好的选择

Taking your 2 lines & making them 400k, reading them in w/o parsing as dates 拿出2行并将其设为40万,将其作为日期不进行解析而读取

In [34]: %timeit read_csv(StringIO(data + data2*200000),sep='\s+')
1 loops, best of 3: 328 ms per loop

In [35]: df = read_csv(StringIO(data + data2*200000),sep='\s+')

Parse the dates, you need to specify a format as this is not an ISO 8601 format, and hence is parsed in python space 解析日期,您需要指定一种格式,因为它不是ISO 8601格式,因此在python空间中进行了解析

In [36]: %timeit pd.to_datetime(x.time,format='%Y.%m.%dT%H:%M:%S.%f')
1 loops, best of 3: 2.43 s per loop

In [37]: df.time = pd.to_datetime(df.time,format='%Y.%m.%dT%H:%M:%S.%f')

In [38]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 400000 entries, 0 to 399999
Data columns (total 6 columns):
sym      400000 non-null object
time     400000 non-null datetime64[ns]
bid      400000 non-null float64
ask      400000 non-null float64
bsize    400000 non-null int64
asize    400000 non-null int64
dtypes: datetime64[ns](1), float64(2), int64(2), object(1)
memory usage: 21.4+ MB

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM