[英]Best way to calculate over a large data frame?
I am trying to find best way to handle a dataset around 80 million rows.我试图找到处理大约 8000 万行数据集的最佳方法。 I need to make some calculations over this data.
我需要对这些数据进行一些计算。 I am trying
for
loops but takes like forever.我试图
for
循环,但需要像永远。
I have data as below (individual taxi trips from one area to another, resolution of 15 minutes):我有以下数据(从一个区域到另一个区域的单独出租车行程,分辨率为 15 分钟):
timestamp, origin_area, destination_area
2014-01-27 11:00:00, 28.0, 32.0
2014-01-27 11:00:00, 28.0, 32.0
2013-01-01 01:00:00, 28.0, 1.0
2013-01-01 01:15:00, 28.0, 2.0
I need to convert this data into some columns like this:我需要将这些数据转换成这样的一些列:
timestamp, origin_area, destination_area, (sum of trips for distinct origin-destination couples in that timestamp), (sum of all trips from origin area in that timestamp) timestamp, origin_area, destination_area, (该时间戳中不同起点 - 目的地夫妇的行程总和),(该时间戳中来自起点区域的所有行程的总和)
What are my options to fastly handle these calculations and creating additional columns as above?我有哪些选项可以快速处理这些计算并创建上述附加列?
Thank you谢谢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.