Python CSV-需要匯總按另一列中的值分組的列中的值

Question

我的csv中有數據需要解析。 看起來像：

Date, Name, Subject, SId, Mark
2/2/2013, Andy Cole, History, 216351, 98
2/2/2013, Andy Cole, Maths, 216351, 87
2/2/2013, Andy Cole, Science, 217387, 21
2/2/2013, Bryan Carr, Maths, 216757, 89
2/2/2013, Carl Jon, Botany, 218382, 78
2/2/2013, Bryan Carr, Biology, 216757, 27

我需要使用Sid作為鍵，並使用此鍵匯總標記列中的所有值。 輸出將類似於：

Sid     Mark
216351  185
217387   21
216757  116
218382   78

我不必將輸出寫在文件上。 當我執行python文件時，我將只需要它。 這是一個類似的問題。 應該如何更改以跳過其間的列？

Answer 1

這是直方圖的概念。 從collections使用defaultdict(int)並遍歷所有行。 使用“ Sid”值作為字典的鍵，並將“ Mark”值添加到當前值。

類型為int的defaultdict確保如果到目前為止不存在鍵，則其值將初始化為0。

from collections import defaultdict

d = defaultdict(int)

with open("data.txt") as f:
    for line in f:
        tokens = [t.strip() for t in line.split(",")]
        try:
            sid = int(tokens[3])
            mark = int(tokens[4])
        except ValueError:
            continue
        d[sid] += mark

print d

輸出：

defaultdict(<type 'int'>, {217387: 21, 216757: 116, 218382: 78, 216351: 185})

您可以將解析部分更改為其他任何部分（例如，使用csvreader或執行其他驗證）。 這里的關鍵點是使用defaultdict(int)並進行如下更新：

d[sid] += mark

Answer 2

如果要在提供的鏈接中調整解決方案，則可以修改要解包的行。

這是一個主意（改編自OP鏈接中的samplebias解決方案）：

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('data.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue
    # unpack the columns into local variables


    _, _, _, SID, mark = row#### <--- HERE, change what you unpack


    # for each SID, add the mark the list
    data[SID].append(float(mark))

# loop over each SID and its list of mark and calculate the sum
for zipcode, mark in data.iteritems():
    print SID, sum(mark)

Answer 3

首先，要解析CSV，請使用csv模塊：

with open('data.csv', 'rb') as f:
    data = csv.DictReader(f)

現在，您想按Sid對它們進行分組。 您可以先進行排序，然后再使用groupby 。 （如果相等的值始終是連續的，則不必進行排序。）

    siddata = sorted(data, key=operator.itemgetter('SId'))
    sidgroups = itertools.groupby(siddata, operator.itemgetter('SId'))

現在，您要對每個組中的值求和：

    for key, group in sidgroups:
        print('{}\t{}'.format(key, sum(int(value['Mark']) for value in group))

或者，您可以將所有內容都寫到數據庫中，然后讓SQLite找出如何為您完成操作：

with open('data.csv', 'rb') as f, sqlite3.connect(':memory:') as db:
    db.execute('CREATE TABLE data (SId, Mark)')
    db.executemany('INSERT INTO data VALUES (:SId, :Mark)', csv.DictReader(f))
    cursor = db.execute('SELECT SId, SUM(Mark) AS Mark FROM data GROUP BY SId')
    for row in cursor:
        print('{}\t{}'.format(row))

Python CSV-需要匯總按另一列中的值分組的列中的值

問題描述

3 個解決方案

解決方案1
2 已采納 2013-07-18 00:14:58

解決方案2
0 2013-07-18 00:14:42

解決方案3
-1 2013-07-18 00:15:55

Python CSV-需要匯總按另一列中的值分組的列中的值

問題描述

3 個解決方案

解決方案1 2 已采納 2013-07-18 00:14:58

解決方案2 0 2013-07-18 00:14:42

解決方案3 -1 2013-07-18 00:15:55

解決方案1
2 已采納 2013-07-18 00:14:58

解決方案2
0 2013-07-18 00:14:42

解決方案3
-1 2013-07-18 00:15:55