简体   繁体   English

按第一列中的值对数据进行分组

[英]Grouping data by value in first column

I'm trying to group data from a 2 column object based on the value of a first column. 我正在尝试根据第一列的值对来自2列对象的数据进行分组。 I need this data in a list so I can sort them afterwards. 我需要列表中的这些数据,以便我可以在之后对它们进行排序。 I am fetching interface data with snmp on large number of machines. 我在大量机器上使用snmp获取接口数据。 In the example I have 2 interfaces. 在示例中,我有2个接口。 I need data grouped by interface preferably in a list. 我需要按接口分组的数据最好在列表中。

Data i get is in object item: 我得到的数据是在对象项目中:

for i in item:
   print i.oid, i.val

ifDescr lo
ifDescr eth0
ifAdminStatus 1
ifAdminStatus 1
ifOperStatus 1
ifOperStatus 0

i would like to get this data sorted in a list by value in the first column, like this: 我想在第一列中按值列出这些数据,如下所示:

I would like to get this data in a list, so it looks like this: 我想将这些数据放在一个列表中,所以它看起来像这样:

list=[[lo,1,1], [eth0,1,0]] list = [[lo,1,1],[eth0,1,0]]

Solution I have is oh so dirty and long and I'm embarrassed to post it here, so any help is appreciated. 解决方案我有这么脏,很长,我很尴尬发布在这里,所以任何帮助表示赞赏。

Here is my solution so you get better picture what I'm talking about. 这是我的解决方案,让您更好地了解我正在谈论的内容。 What I did is put each interface data in separate list based on item.oid, and then iterated trough cpu list and compared it to memory and name based on item.iid. 我所做的是将每个接口数据放在基于item.oid的单独列表中,然后通过cpu列表进行迭代,并将其与基于item.iid的内存和名称进行比较。 In the end I have all data in cpu list where each interface is an element of the list. 最后,我在cpu列表中有所有数据,其中每个接口都是列表的元素。 This solution works, but is too slow for my needs. 这个解决方案有效,但对我的需求来说太慢了。

cpu=[]
memory=[]
name=[]

for item in process:
    if item.oid=='ifDescr':
        cpu.append([item.iid, int(item.val)])
    if item.oid=='ifAdminStatus':
        memory.append([item.iid, int(item.val)])
    if item.oid=='ifOperStatus':
        name.append([item.iid, item.val])


for c in cpu:
    for m in memory:
        if m[0]==c[0]:
            c.append(m[1])
    for n in name:
        if n[0]==c[0]:
            c.append(n[1])
cpu=sorted(cpu,key=itemgetter(1),reverse=True) #sorting is easy

Is there a pythonic, short and faster way of doing this? 有没有pythonic,短和更快的方式这样做? Limiting factor is that I get data in a 2 column object with key=data values. 限制因素是我使用key = data值获取2列对象中的数据。

Not sure I follow your sorting as I don't see any order but to group you can use a dict grouping by oid using a defaultdict for the repeating keys: 不确定我是否遵循您的排序,因为我没有看到任何订单,但是对于分组,您可以使用oid使用默认 dict对重复键进行分组:

data = """ifDescr lo
ifDescr eth0
ifAdminStatus 1
ifAdminStatus 1
ifOperStatus 1
ifOperStatus 0"""

from collections import defaultdict

d = defaultdict(list)
for line in data.splitlines():
    a, b = line.split()
    d[a].append(b)
print((d.items()))
[('ifOperStatus', ['1', '0']), ('ifAdminStatus', ['1', '1']), ('ifDescr', ['lo', 'eth0'])]

using your code just use the attributes: 使用您的代码只需使用以下属性:

for i in item:
   d[i.oid].append(i.val)

Pandas is a great way to work with data. Pandas是处理数据的好方法。 Here is a quick example code. 这是一个快速示例代码。 Check out the official website for more info. 查看官方网站了解更多信息。

# Python script using Pandas and Numpy
from pandas import DataFrame
from numpy import random

# Data with the dictionary keys defining the columns
data_dictionary = {'a': random.random(5), 
                   'b': random.random(5)}
# Make a data frame 
data_frame = DataFrame(data_dictionary)
print(data_frame)

# Return an new data frame with a sorted first column
data_frame_sorted = data_frame.sort_index(by='a')
print(data_frame_sorted)

This should run if you have numpy an pandas installed. 如果您安装了大熊猫,这应该会运行。 If you don't have any clue about installing pandas go get the "anaconda python distribution." 如果您对安装pandas没有任何线索,请获取“anaconda python发行版”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM