按第一列中的值对数据进行分组

Question

I'm trying to group data from a 2 column object based on the value of a first column. 我正在尝试根据第一列的值对来自2列对象的数据进行分组。 I need this data in a list so I can sort them afterwards. 我需要列表中的这些数据，以便我可以在之后对它们进行排序。 I am fetching interface data with snmp on large number of machines. 我在大量机器上使用snmp获取接口数据。 In the example I have 2 interfaces. 在示例中，我有2个接口。 I need data grouped by interface preferably in a list. 我需要按接口分组的数据最好在列表中。

Data i get is in object item: 我得到的数据是在对象项目中：

for i in item:
   print i.oid, i.val

ifDescr lo
ifDescr eth0
ifAdminStatus 1
ifAdminStatus 1
ifOperStatus 1
ifOperStatus 0

~~i would like to get this data sorted in a list by value in the first column, like this:~~ ~~我想在第一列中按值列出这些数据，如下所示：~~

I would like to get this data in a list, so it looks like this: 我想将这些数据放在一个列表中，所以它看起来像这样：

list=[[lo,1,1], [eth0,1,0]] list = [[lo，1,1]，[eth0,1,0]]

~~Solution I have is oh so dirty and long and I'm embarrassed to post it here, so any help is appreciated.~~ ~~解决方案我有这么脏，很长，我很尴尬发布在这里，所以任何帮助表示赞赏。~~

Here is my solution so you get better picture what I'm talking about. 这是我的解决方案，让您更好地了解我正在谈论的内容。 What I did is put each interface data in separate list based on item.oid, and then iterated trough cpu list and compared it to memory and name based on item.iid. 我所做的是将每个接口数据放在基于item.oid的单独列表中，然后通过cpu列表进行迭代，并将其与基于item.iid的内存和名称进行比较。 In the end I have all data in cpu list where each interface is an element of the list. 最后，我在cpu列表中有所有数据，其中每个接口都是列表的元素。 This solution works, but is too slow for my needs. 这个解决方案有效，但对我的需求来说太慢了。

cpu=[]
memory=[]
name=[]

for item in process:
    if item.oid=='ifDescr':
        cpu.append([item.iid, int(item.val)])
    if item.oid=='ifAdminStatus':
        memory.append([item.iid, int(item.val)])
    if item.oid=='ifOperStatus':
        name.append([item.iid, item.val])


for c in cpu:
    for m in memory:
        if m[0]==c[0]:
            c.append(m[1])
    for n in name:
        if n[0]==c[0]:
            c.append(n[1])
cpu=sorted(cpu,key=itemgetter(1),reverse=True) #sorting is easy

Is there a pythonic, short and faster way of doing this? 有没有pythonic，短和更快的方式这样做？ Limiting factor is that I get data in a 2 column object with key=data values. 限制因素是我使用key = data值获取2列对象中的数据。

Answer 1

Not sure I follow your sorting as I don't see any order but to group you can use a dict grouping by oid using a defaultdict for the repeating keys: 不确定我是否遵循您的排序，因为我没有看到任何订单，但是对于分组，您可以使用oid使用默认 dict对重复键进行分组：

data = """ifDescr lo
ifDescr eth0
ifAdminStatus 1
ifAdminStatus 1
ifOperStatus 1
ifOperStatus 0"""

from collections import defaultdict

d = defaultdict(list)
for line in data.splitlines():
    a, b = line.split()
    d[a].append(b)
print((d.items()))
[('ifOperStatus', ['1', '0']), ('ifAdminStatus', ['1', '1']), ('ifDescr', ['lo', 'eth0'])]

using your code just use the attributes: 使用您的代码只需使用以下属性：

for i in item:
   d[i.oid].append(i.val)

Answer 2

Pandas is a great way to work with data. Pandas是处理数据的好方法。 Here is a quick example code. 这是一个快速示例代码。 Check out the official website for more info. 查看官方网站了解更多信息。

# Python script using Pandas and Numpy
from pandas import DataFrame
from numpy import random

# Data with the dictionary keys defining the columns
data_dictionary = {'a': random.random(5), 
                   'b': random.random(5)}
# Make a data frame 
data_frame = DataFrame(data_dictionary)
print(data_frame)

# Return an new data frame with a sorted first column
data_frame_sorted = data_frame.sort_index(by='a')
print(data_frame_sorted)

This should run if you have numpy an pandas installed. 如果您安装了大熊猫，这应该会运行。 If you don't have any clue about installing pandas go get the "anaconda python distribution." 如果您对安装pandas没有任何线索，请获取“anaconda python发行版”。

按第一列中的值对数据进行分组

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-06-30 22:47:20

解决方案2
1 2015-07-01 00:16:26

按第一列中的值对数据进行分组

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-06-30 22:47:20

解决方案2 1 2015-07-01 00:16:26

解决方案1
2 已采纳 2015-06-30 22:47:20

解决方案2
1 2015-07-01 00:16:26