简体   繁体   English

提高熊猫python的性能

[英]improve performance in pandas python

Here is the example of my data set from some public transports: data set . 这是一些公共交通工具上的数据集示例: 数据集

Date is from 2018-06-01 to 2018-06-30, 日期是从2018-06-01至2018-06-30,

Time is operation hours, from 5am to 24(0)am, 时间是营业时间,从5am到24(0)am,

People is the number of peope in that specific date, time and trip. 人员是该特定日期,时间和行程中的人数。

from_to is the where those people enter and leave (one type of trips), from_to是这些人进入和离开的地方(一种旅行),

and finally weekday. 最后是工作日。

What I need to do here is to create a time table for each trip, for example if I want to create a table for trip "G1_G2", the code I use now is: 我需要在这里为每次旅行创建一个时间表,例如,如果我想为旅行“ G1_G2”创建一个表,那么我现在使用的代码是:

for i in [0,1,2,3,4,5,6]:
    for j in [0,1,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]:
        df['people'][(df['weekday'] == i)&(df['from_to'] == 'G1_G2') & (df['time'] == j)].mean()

Where "i" is weekday, and j is the operating hours. 其中“ i”是工作日,而j是营业时间。 The result will be a table like: output table 结果将是一个像这样的表: 输出表

But the problem here is that each table will take about 10 seconds to create, there are around 11,000 types of trip here, which will take 30 something hours. 但是这里的问题在于,每个表的创建大约需要10秒,这里大约有11,000种旅行类型,这将花费30个小时的时间。

Is there other ways to do this with higher efficiency using python? 还有其他方法可以使用python来实现更高的效率吗?

Thanks in adanvance! 非常感谢!

Probably using groupby and aggregate you can do this. 可能使用groupby和聚合可以做到这一点。

import pandas as pd

Deliberately I use here a small data example. 故意在这里使用一个小数据示例。 If you have many smaller tables as I interpret from your description, you may want to concatenate before. 如果您有许多较小的表(根据我从您的描述中得出的解释),可能需要先进行串联。

df = pd.DataFrame({'date':['2018-06-01', '2018-06-01', '2018-06-01', '2018-06-02', '2018-06-02', '2018-06-02'], 'time':[0,0,0,1,1,1], 'people':[0,2,2,4,5,7], 'from_to':['BR13_BR13', 'BR13_BR13','BR13_BR13','BR13_BR13','BR13_BR13','BR13_BR13'], 'weekday':[4,4,4,5,5,5]})

The following code results in a long format, not in the wide format that your output table has, but it can be made wide if you wanted: 以下代码产生的是长格式,而不是输出表具有的宽格式,但是如果需要,可以将其制成宽格式:

df.groupby(['from_to', 'time', 'weekday']).aggregate('people').mean()

Which results in the following output: 结果如下:

from_to    time  weekday
BR13_BR13  0     4          1.333333
           1     5          5.333333
Name: people, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM