[英]Python 3.6. Get average Y for all same X coordinates
I have a list of coordinates that looks like this: 我有一个坐标列表,如下所示:
my_list = [[1, 1], [1, 3], [1, 5], [2, 1], [2, 3]]
As we see, there are same X values for first three coordinates with different Y and same situation for another two coordiantes. 正如我们所看到的,前三个坐标有相同的X值,不同的Y和另外两个坐标的相同情况。 I want to make new list which will look like this:
我想制作一个如下所示的新列表:
new_list = [[1, 3], [2, 2]]
where y1 = 3 = (1+3+5)/3
and y2 = 2 = (1+3)/2
. 其中
y1 = 3 = (1+3+5)/3
, y2 = 2 = (1+3)/2
。 I have written my code which is below, but it works slowly. 我编写了下面的代码,但工作缓慢。
I work with hundreds of thousands coordinates so the question is: How to make this code work faster? 我使用数十万个坐标,所以问题是:如何让这段代码更快地运行? Is there any optimization or special open source libraty, which can speed up my code?
是否有任何优化或特殊的开源库,可以加快我的代码?
Thank you in advance. 先感谢您。
x_mass = []
for m in mass:
x_mass.append(m[0])
set_x_mass = set(x_mass)
list_x_mass = list(set_x_mass)
performance_points = []
def function(i):
unique_x_mass = []
for m in mass:
if m[0] == i:
unique_x_mass.append(m)
summ_y = 0
for m in unique_x_mass:
summ_y += m[1]
point = [float(m[0]), float(summ_y/len(unique_x_mass))]
performance_points.append(point)
return performance_points
for x in list_x_mass:
function(x)
Create DataFrame
and aggregate mean
: 创建
DataFrame
和聚合mean
:
L = [[1, 1], [1, 3], [1, 5], [2, 1], [2, 3]]
L1 = pd.DataFrame(L).groupby(0, as_index=False)[1].mean().values.tolist()
print (L1)
[[1, 3], [2, 2]]
The pandas solution offered by @jezrael is elegant but slow (like almost everything pandas). @jezrael提供的大熊猫解决方案优雅但缓慢(几乎所有的熊猫)。 I would suggest using modules
itertools
and statistics
: 我建议使用模块
itertools
和statistics
:
from statistics import mean
from itertools import groupby
grouper = groupby(L, key=lambda x: x[0])
#The next line is again more elegant, but slower:
#grouper = groupby(L, key=operator.itemgetter(0))
[[x, mean(yi[1] for yi in y)] for x,y in grouper]
The result is, of course, the same. 结果当然是一样的。 The execution time for the sample list is two orders of magnitude faster.
样本列表的执行时间快两个数量级 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.