[英]average of groupby elements python
so I have this list looks like that: 所以我有这个列表看起来像这样:
58308.803701 132.227.127.170 50602 149.13.32.15 443 6 64
58308.815456 149.13.32.15 443 132.227.127.170 50602 6 60
58308.815524 132.227.127.170 50602 149.13.32.15 443 6 52
58308.817244 132.227.127.170 50602 149.13.32.15 443 6 57
58308.828987 149.13.32.15 443 132.227.127.170 50602 6 52
58308.829133 149.13.32.15 443 132.227.127.170 50602 6 57
58308.829169 132.227.127.170 50602 149.13.32.15 443 6 52
58308.912361 132.227.127.170 50603 86.4.136.93 443 6 64
58308.912497 132.227.127.170 50599 94.31.112.216 443 6 95
58308.912568 132.227.127.170 50599 94.31.112.216 443 6 96
58308.912977 132.227.127.170 50599 94.31.112.216 443 6 847
58308.913411 132.227.127.170 50599 94.31.112.216 443 6 154
58308.913484 132.227.127.170 50599 94.31.112.216 443 6 233
....
....
....
and I want to group each similar lines (with the same five columns in the middle) and show in the output the minimal of the first column and the average,median,mean,min,max,...(all possible statistic metrics) like the following: 我想将每行相似的行分组(中间有相同的五列),并在输出中显示第一列的最小值以及平均值,中位数,均值,最小值,最大值...(所有可能的统计指标)如下所示:
58308.803701 132.227.127.170 50602 149.13.32.15 443 6 64
58308.815456 149.13.32.15 443 132.227.127.170 50602 6 60
min of(58308.815524,58308.817244) 132.227.127.170 50602 149.13.32.15 443 6 min/max/avg/...of(52,57)
min of(58308.828987,58308.829133) 149.13.32.15 443 132.227.127.170 50602 6 min/max/avg/...of(52,57)
58308.829169 132.227.127.170 50602 149.13.32.15 443 6 52
58308.912361 132.227.127.170 50603 86.4.136.93 443 6 64
min of(58308.912497,..,58308.913484) 132.227.127.170 50599 94.31.112.216 443 6 min/max/avg/...of(95,96,847,154,233)
....
....
....
so here is the code I wrote so far and trying to make it work: 所以这是我到目前为止编写的代码,并试图使其工作:
from itertools import groupby
import re
import numpy as np
tstFile=open("output","w+")
with open('dataInput','r') as d:
f1 = ([x for x in line.split()] for line in d)
for a,b in groupby(f1,key=lambda x:x[1:6]):
tstFile.write("%s\t%s\t%s\t%s\t%s\t%s\t%s\n" %(min(x[0] for x in b)),min(x[6] for x in b)),max(x[6] for x in b)),np.average(x[6] for x in b)),np.mean(x[6] for x in b)),np.median(x[6] for x in b)),np.std(x[6] for x in b)))
tstFile.close()
but nothing really seems to work, it only work for the min and max but to get each result I have to only use one argument... like this : 但实际上似乎没有任何作用,它仅适用于最小值和最大值,但要获得每个结果,我只需要使用一个参数即可,如下所示:
tstFile=open("output","w+")
with open('dataInput','r') as d:
f1 = ([x for x in line.split()] for line in d)
for a,b in groupby(f1,key=lambda x:x[1:6]):
tstFile.write("%s\n" %(min(x[6] for x in b)))
tstFile.close()
Any help PLEASE ! 任何帮助,请!
When dealing with csv-files, it's generally advised to use the csv module . 处理csv文件时,通常建议使用csv模块 。 I've included a sample code below which demonstrates how you could solve this problem.
我在下面提供了一个示例代码,该示例代码演示了如何解决此问题。
If your input file is tab-delimited, change to delimiter='\\t'
and remove the skipinitialspace=True
in csv.reader
- the tabs weren't present in the sample input, but they may have disappeared during copy/paste. 如果您的输入文件是制表符分隔的,请更改为
delimiter='\\t'
并删除skipinitialspace=True
中的skipinitialspace=True
csv.reader
示例输入中没有这些选项卡,但是在复制/粘贴过程中这些选项卡可能已消失。
import csv
from itertools import groupby
import numpy as np
with open('data.csv') as in_file, open('out.csv', 'wb') as out_file:
reader = csv.reader(in_file, delimiter=' ', skipinitialspace=True)
writer = csv.writer(out_file, delimiter='\t')
for key, group in groupby(reader, key=lambda r: r[1:6]):
col0, col6 = np.array(list(group))[:, [0, 6]].transpose().astype(float)
writer.writerow([min(col0)] + key + [int(min(col6)), int(max(col6)),
np.mean(col6)])
Output (I added some tabs to increase readability): 输出(我添加了一些选项卡以提高可读性):
58308.803701 132.227.127.170 50602 149.13.32.15 443 6 64 64 64.0
58308.815456 149.13.32.15 443 132.227.127.170 50602 6 60 60 60.0
58308.815524 132.227.127.170 50602 149.13.32.15 443 6 52 57 54.5
58308.828987 149.13.32.15 443 132.227.127.170 50602 6 52 57 54.5
58308.829169 132.227.127.170 50602 149.13.32.15 443 6 52 52 52.0
58308.912361 132.227.127.170 50603 86.4.136.93 443 6 64 64 64.0
58308.912497 132.227.127.170 50599 94.31.112.216 443 6 95 847 285.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.