简体   繁体   English

CSV中二维列表中的Python组值

[英]Python group values in 2d list from CSV

I have the following CSV 我有以下CSV

BBCP1,Grey,2140,805EC0FFFFE2,0000000066
BBCP1,Test,2150,805EC0FFFFE2,0000000066
BBCP1,Test,2151,805EC0FFFFE1,0000000066
BBCP1,Centre,2141,805EC0FFFFE3,000000077
BBCP1,Yellow,2142,805EC0FFFFE3,000000077
BBCP1,Purple,2143,805EC0FFFFE3,000000077
BBCP1,Green,2144,805EC0FFFFE3,000000077
BBCP1,Pink,2145,805EC0FFFFE3,000000077

I'm reading this data in using 我正在使用读取数据

data = list(csv.reader(open(csvFile)))

I want to turn this data into a 2d array or equivilent and group by the value in the 4th column (the MAC address), preserving the order they were in in the original list. 我想将此数据转换为2d数组或等效数组,并按第4列 (MAC地址)中的值进行分组 ,并保留它们在原始列表中的顺序 So it would look like 所以看起来像

[(BBCP1,Grey,2140,805EC0FFFFE2,0000000066),(BBCP1,Test,2150,805EC0FFFFE2,0000000066)],
[(BBCP1,Test,2151,805EC0FFFFE1,0000000066)],
[(BBCP1,Centre,2141,805EC0FFFFE3,000000077),
(BBCP1,Yellow,2142,805EC0FFFFE3,000000077),
(BBCP1,Purple,2143,805EC0FFFFE3,000000077),
(BBCP1,Green,2144,805EC0FFFFE3,000000077),
(BBCP1,Pink,2145,805EC0FFFFE3,000000077)]

Hopefully i've displayed the array correctly and it makes sense. 希望我已经正确显示了数组,这很有意义。

I then need to loop the arrays to output the data to file. 然后,我需要循环数组以将数据输出到文件。 Which i'm pretty sure i'm ok with a nested for loop. 我很确定我可以使用嵌套的for循环。

Thanks in advance for any help 预先感谢您的任何帮助

use defaultdict to group the data ( groupby would require sorting and would be unefficient / would kill the order), then print the sorted dictionary values (sorting isn't really necessary, it's just to stabilize the output): 使用defaultdict对数据进行分组( groupby将需要排序,并且效率不高/会杀死订单),然后打印已排序的字典值(排序不是真正必要的,只是为了稳定输出):

import csv,collections

d = collections.defaultdict(list)

for row in csv.reader(txt):
    mac_address = row[3]
    d[mac_address].append(row)

print(sorted(d.values()))

resulting in: 导致:

[[['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']],
 [['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
  ['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
 [['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']]]

sorting according to key (the mac address): 根据密钥(mac地址)排序:

values = [v for _,v in sorted(d.items())]

yields: 收益率:

[[['BBCP1', 'Test', '2151', '805EC0FFFFE1', '0000000066']],
 [['BBCP1', 'Grey', '2140', '805EC0FFFFE2', '0000000066'],
  ['BBCP1', 'Test', '2150', '805EC0FFFFE2', '0000000066']],
 [['BBCP1', 'Centre', '2141', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Yellow', '2142', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Purple', '2143', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Green', '2144', '805EC0FFFFE3', '000000077'],
  ['BBCP1', 'Pink', '2145', '805EC0FFFFE3', '000000077']]]

hi i used pandas and groupby to solve the problem. 嗨,我用pandasgroupby来解决这个问题。 Hope this helps!! 希望这可以帮助!!

data = pd.read_csv('data.txt', header=None)
data.columns = ['A','B','C','D','E'] # random names to the column

def check(data):
    data_item = []
    for index,item in data.iterrows():
        data_item.append(item.tolist()))
    return data_item   

grouped_data = data.groupby('D',sort=False).apply(check)

for data in grouped_data:
    print(data)

Output #preserving the order 输出#保留订单

[['BBCP1', 'Grey', 2140, '805EC0FFFFE2', 66], ['BBCP1', 'Test', 2150, '805EC0FFFFE2', 66]]
[['BBCP1', 'Test', 2151, '805EC0FFFFE1', 66]]
[['BBCP1', 'Centre', 2141, '805EC0FFFFE3', 77], ['BBCP1', 'Yellow', 2142, '805EC0FFFFE3', 77], ['BBCP1', 'Purple', 2143, '805EC0FFFFE3', 77], ['BBCP1', 'Green', 2144, '805EC0FFFFE3', 77], ['BBCP1', 'Pink', 2145, '805EC0FFFFE3', 77]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM