简体   繁体   English

计算并绘制(Y)总计列表中每(X)个项目的平均值

[英]Calculating and Plotting the Average of every (X) items in a list of (Y) total

I have searched, and searched (for 4 days) before posting this. 在发布之前,我已经搜索了,并搜索了4天。 I apologize in advance if it is too elementary, and a waste of your time. 如果太基础了,我预先致歉,浪费您的时间。 I have successfully generated some basic plots using pyplot, and matplotlib by using their tutorial's examples, but to no avail for what I need to accomplish. 我已经通过使用pyplot和matplotlib通过本教程的示例成功生成了一些基本图,但是对于我需要完成的工作毫无用处。

Essentially: 实质上:

  • I have a list of numbers that exist in a single file. 我有一个文件中存在的数字列表。
  • Each line contains a number corresponding to the number of milliseconds that it takes to complete a certain repeated task. 每行包含一个数字,该数字对应于完成某个重复任务所花费的毫秒数。
  • There are over a million entries in this file, and it can grow beyond that. 该文件中有超过一百万个条目,并且可以扩展到不止一个。

Example of 20: 示例20:

173
1685
1152
253
1623
390
84
40
319
86
54
991
1012
721
3074
4227
4927
181
4856
1415

Eventually what I'll need to do is calculate a range of individual totals (distributed evenly over the absolute total number of entries) -- and then plot those averages using any of the plotting libs for python. 最终,我需要做的是计算单个总数的范围(平均分配在条目的绝对总数上),然后使用任何python绘图库绘制这些平均值。 I have considered using pyplot for ease of use. 我考虑过使用pyplot以便于使用。

  • The X axis will correspond to the total number of tasks completed, as the Y axis will represent the number of milliseconds it takes to complete the task (for this example the average time it takes to complete every 5). X轴将对应于已完成任务的总数,而Y轴将代表完成任务所需的毫秒数(在此示例中,每5秒钟完成一次的平均时间)。

ie: 即:

Entries 1-5 = (plottedTotalA)
Entries 6-10 = (plottedTotalB)
Entries 11-15 = (plottedTotalC)
Entries 16-20 = (plottedTotalD)

From what I can tell, I don't need to indefinitely store the values of the variables, only pass them as they are processed (in order) to the plotter. 据我所知,我不需要无限期地存储变量的值,而只需将它们(按顺序)处理后传递给绘图仪即可。 I have tried the following example to sum a range of 5 entries from the above list of 20 (which works), but I don't know how to dynamically pass the 5 at a time until completion, all the while retaining the calculated averages which will ultimately be passed to pyplot. 我已经尝试了以下示例,对上述20个列表中的5个条目进行求和(有效),但是我不知道如何一次动态传递5个条目直到完成,同时始终保持计算得出的平均值最终将传递给pyplot。

ex: 例如:

Python 2.7.3 (default, Jul 24 2012, 10:05:38) 
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> plottedTotalA = ['173', '1685', '1152', '253', '1623']
>>> sum(float(t) for t in plottedTotalA)
4886.0

Let's assume you have your n values in a list called x. 假设您在名为x的列表中拥有n个值。 Then reshape x into an array A with 5 columns and calculate the mean for each line. 然后将x重整为具有5列的数组A,并计算每行的平均值。 Then you can simply plot the resulting vector. 然后,您可以简单地绘制所得向量。

x = np.array(x)
n = x.size
A = x[:(n // 5) * 5].reshape(5, -1)
y = A.mean(axis = 0)
plot(y)

EDIT: changed my code according to tacaswell's comment 编辑:根据塔卡斯韦尔的评论更改了我的代码

However, you might run into memory problems if you actually have over a million entries. 但是,如果实际上有超过一百万个条目,则可能会遇到内存问题。 You could also use the name x instead of A and y. 您也可以使用名称x代替A和y。 This way you would overwrite the initial data and save some memory. 这样,您将覆盖初始数据并节省一些内存。

I hope this helps 我希望这有帮助

I've taken the problem to be how to get 5 items from a list that's generated from a file. 我认为问题是如何从文件生成的列表中获取5个项目。 As you said: 如你所说:

I don't know how to dynamically pass the 5 at a time until completion, 我不知道如何一次动态通过5分,直到完成,

I've used /dev/random as it's never ending and random and simulates your big file and shows processing a big file without reading into a list or similar slurping of data. 我使用了/dev/random因为它永远不会结束并且是随机的,并且可以模拟您的大文件并显示处理大文件而无需读取列表或类似的数据范围。

################################################################################
def bigfile():
    """Never ending list of random numbers"""
    import struct
    with open('/dev/random') as f:
        while True:
            yield  struct.unpack("H",f.read(2))[0]
################################################################################
def avg(l):
    """Noddy version"""
    return sum(l)/len(l)
################################################################################

bigfile_i = bigfile()

import itertools
## Grouper recipe @ itertools
by_5  = itertools.imap(None, *[iter(bigfile_i)]*5)

# Only take 5, 10 times.
for x in range(10):
    l = by_5.next()
    a = avg(l)
    print l, a ## PLOT ?

EDIT 编辑

Detail of what happens to the remainder. 其余部分的详细信息。

If we pretend the file has a 11 lines and we take 5 each time: 如果我们假装文件有11行,则每次取5行:

In [591]: list(itertools.izip_longest(*[iter(range(11))]*5))
Out[591]: [(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, None, None, None, None)]

In [592]: list(itertools.imap(None, *[iter(range(11))]*5))
Out[592]: [(0, 1, 2, 3, 4), (5, 6, 7, 8, 9)]

In [593]: list(itertools.izip(*[iter(range(11))]*5))
Out[593]: [(0, 1, 2, 3, 4), (5, 6, 7, 8, 9)]

In one case izip_longest will fill the remainder with None whereas imap and izip wil truncate. 在一种情况下, izip_longest将用“ None填充其余部分,而imapizip将被截断。 I can imagine the OP will want to perhaps use itertools.izip_longest(*iterables[,fillvalue]) for the optional fill value, although None is a good sentinel for No Values . 我可以想象OP可能要使用itertools.izip_longest(*iterables[,fillvalue])作为可选的填充值,尽管None对于No Values是一个很好的itertools.izip_longest(*iterables[,fillvalue])

I hope that makes it clear what happens to the remainder. 我希望这样可以弄清楚其余的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM