简体   繁体   English

如何在Python中汇总字典并按键值排序

[英]How do I Sum Dictionary in Python and Sort by Key Value

I have a log file of net-flow data that I am trying to sort by ip address and time stamp and add the bytes. 我有一个净流数据的日志文件,该文件试图按IP地址和时间戳进行排序并添加字节。 Thus, it needs to list the same ip address in descending order by byte amount. 因此,它需要按字节数降序列出相同的IP地址。

The output of the file reads: 该文件的输出为:

                 Min       Source IP                     Bytes

./R2snd/2014/02/02/02/25.flows:100.000.000.000|101.101.101.101|0|4|3|2|96|1391336665|1391336668|3361|445|2|6|0|0|0|0|0 ./R2snd/2014/02/02/02/25.flows:100.000.000.000 | 101.101.101.101 | 0 | 4 | 3 | 2 | 96 | 1391336665 | 1391336668 | 3361 | 445 | 2 | 6 | 0 | 0 | 0 | 0 | 0

For some reason I can only get it to display the minute but i need the whole time and date formatted. 由于某种原因,我只能显示分钟,但是我需要格式化整个时间和日期。 The minute is the last /number i typed minute above it. 分钟是我在其上方输入分钟的最后一个数字。 Then I need it to take every ip address in the file and sort them by ip thus repeating ips would appear together, and add the bytes send for each ip. 然后,我需要它获取文件中的每个ip地址,并按ip对其进行排序,这样重复的ip就会一起出现,并添加每个ip发送的字节。 I have tried to do this below with a dictionary but I can't seem to get it to work. 我试图在下面用字典来做到这一点,但是我似乎无法使它正常工作。 Then I need to sort the dictionary in descending order by bytes, because for each ip entry it needs to add the bytes, thus the top entry for each ip will be the total bytes sent by that ip. 然后,我需要按字节降序对字典进行排序,因为对于每个ip条目,它都需要添加字节,因此,每个ip的最高条目将是该ip发送的总字节数。

import operator
with open('/home/username/Documents/log') as f:
    for line in f:
        #save the data into an array
        firstsplitforminute = line.split('/')
        secondsplitforminute = firstsplitforminute[6].split('.')
        firstsplitforsourceip = line.split('|')
        secondsplitforsourceip = firstsplitforsourceip[0].split(':')
        minute = secondsplitforminute[0]
        sourceip = secondsplitforsourceip[1]
        bytes = line.split('|')[6]
        protocol = line.split('|')[12]

        if protocol == '6':
            entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
            sum(item['BYTES'] for item in entries)
            def sortbykey():
                sortedbykeydict = sorted(entries.items(), key = lambda t: t[1])
                print sortedbykeydict
             sortbykey() 
        else:
            pass

however I get the following error when I run this code: 但是,运行此代码时出现以下错误:

File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1844, in <module>
    debugger.run(setup['file'], None, None)
  File "/home/grant/.eclipse/org.eclipse.platform_3.8_155965261/plugins/org.python.pydev_3.4.1.201403181715/pysrc/pydevd.py", line 1372, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <module>
    sum(item['BYTES'] for item in entries)
  File "/home/grant/workspace/Learning/LogfileExtractor.py", line 16, in <genexpr>
    sum(item['BYTES'] for item in entries)
TypeError: string indices must be integers, not str

Try parsing: 'BYTES':int(bytes) 尝试解析: 'BYTES':int(bytes)

(For as far as I understand your code that should work) (据我了解,您的代码应该可以正常工作)

@BartoszKP is correct. @BartoszKP是正确的。 Python is looping through entries , which will not result in a string: Python正在遍历entries ,这不会导致字符串:

entries = {'IP':sourceip, 'BYTES':bytes, 'MIN':minute}
sum(item['BYTES'] for item in entries)

Instead, you should "itemize" the dictionary: 相反,您应该“指定”字典:

sum(v for k,v in entries.items())

This means that during the first iteration 'IP' is stored in k and sourceip is stored in v ; 这意味着在第一次迭代中, 'IP'存储在ksourceip存储在v the second, 'BYTES' is stored in k and bytes is stored in v ; 第二个, 'BYTES'存储在kbytes存储在v and so on... 等等...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM