繁体   English   中英

Python csv根据另一列的名称对一列中的项目进行计数

[英]Python csv count items in one column based on the name of another column

我是Python编程的新手。 我有一个很大的CSV文件(〜5k个项目)。 我需要2列来计算数据。 解释我需要的最好方法是向您展示几行csv:

Name column               OPTIONALDATA5 column 
Collaborative Desk  Broward
Collaborative Desk  Broward
Academic Desk           Broward
Academic Desk           Broward
Academic Desk           Broward
Academic Desk           Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Broward             Broward
Alachua             Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua

在上面的示例中,我只是希望结果如下:

Broward:
collaborative Desk - 6
Academic Desk - 4
Broward - 1

Alachua:
collaborative Desk - 5
Alachua - 1

也许也总计,然后进入电子表格中的下一个库。

我开始编写代码,但我想知道是否有更好的方法可以做到这一点。

假设数据是制表符分隔的,这是获取所需内容的一种方法:

import csv
from collections import defaultdict, Counter

input_file = open('data')
csv_reader = csv.reader(input_file, delimiter='\t')

data = defaultdict(list)
for row in csv_reader:
    data[row[1]].append(row[0])

数据现在将包含:

{'Alachua': ['Alachua', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk'], 
 'Broward': ['Collaborative Desk', 'Collaborative Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Broward']}

您可以遍历每个键的值列表并获取总计数,或者使用python中的Counter方法,如下所示:

for k, v in data.items():
    print k
    print Counter(v)

打印:

Alachua
Counter({'Collaborative Desk': 5, 'Alachua': 1})
Broward
Counter({'Collaborative Desk': 6, 'Academic Desk': 4, 'Broward': 1})

这也可以工作(假设您的文件是\\t分隔的):

import itertools
import operator
import csv 
import collections

results = collections.defaultdict(lambda: collections.defaultdict(int))

with open('sample.csv', 'r') as f_in: 
    f_in.seek(0)
    rdr = csv.reader(f_in, delimiter='\t')
    next(rdr)
    for row in rdr:
        results[row[1]][row[0]] += 1

for k, v in results.iteritems():
    print "%s" % k
    for k2, v2 in v.iteritems():
        print "    %s - %s" % (k2, v2)

输出:

Alachua
    Alachua - 1
    Collaborative Desk - 5
Broward
    Collaborative Desk - 6
    Academic Desk - 4
    Broward - 1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM