在 python 中获取没有 pandas 的 csv 列的总和

Question

I have a csv file passed into a function as a string:我有一个 csv 文件作为字符串传递到 function 中：

csv_input = """
            quiz_date,location,size
            2022-01-01,london_uk,134
            2022-01-02,edingburgh_uk,65
            2022-01-01,madrid_es,124
            2022-01-02,london_uk,125
            2022-01-01,edinburgh_uk,89
            2022-01-02,madric_es,143
            2022-01-02,london_uk,352
            2022-01-01,edinburgh_uk,125
            2022-01-01,madrid_es,431
            2022-01-02,london_uk,151"""

I want to print the sum of how many people were surveyed in each city by date, so something like:我想按日期打印每个城市接受调查的人数的总和，例如：

Date.         City.       Pop-Surveyed
2022-01-01.   London.     134
2022-01-01.   Edinburgh.  214
2022-01-01.   Madrid.     555
2022-01-02.   London.     628
2022-01-02.   Edinburgh.  65
2022-01-02.   Madrid.     143

As I can't import pandas on my machine (can't install without internet access) I thought I could use a defaultdict to store the value of each city by date由于我无法在我的机器上导入 pandas（无法在没有互联网访问的情况下安装），我想我可以使用 defaultdict 按日期存储每个城市的值

from collections import defaultdict

survery_data = csv_input.split()[1:]
survery_data = [survey.split(',') for survey in survery_data]

survey_sum = defaultdict(dict)

for survey in survery_data:
    date = survey[0]
    city = survey[1].split("_")[0]
    quantity = survey[-1]

    survey_sum[date][city] += quantity

print(survey_sum)

But doing this returns a KeyError:但是这样做会返回一个 KeyError：

KeyError: 'london'

When I was hoping to have a defaultdict of当我希望有一个 defaultdict

{'2022-01-01': {'london': 134}, {'edinburgh': 214}, {'madrid': 555}},
{'2022-01-02': {'london': 628}, {'edinburgh': 65}, {'madrid': 143}}

Is there a way to create a default dict that gives a structure so I could then iterate over to print out each column like above?有没有办法创建一个提供结构的默认字典，这样我就可以像上面一样迭代打印出每一列？

Answer 1

Try:尝试：

csv_input = """\
            quiz_date,location,size
            2022-01-01,london_uk,134
            2022-01-02,edingburgh_uk,65
            2022-01-01,madrid_es,124
            2022-01-02,london_uk,125
            2022-01-01,edinburgh_uk,89
            2022-01-02,madric_es,143
            2022-01-02,london_uk,352
            2022-01-01,edinburgh_uk,125
            2022-01-01,madrid_es,431
            2022-01-02,london_uk,151"""


header, *rows = (
    tuple(map(str.strip, line.split(",")))
    for line in map(str.strip, csv_input.splitlines())
)

tmp = {}
for date, city, size in rows:
    key = (date, city.split("_")[0])
    tmp[key] = tmp.get(key, 0) + int(size)

out = {}
for (date, city), size in tmp.items():
    out.setdefault(date, []).append({city: size})

print(out)

Prints:印刷：

{
    "2022-01-01": [{"london": 134}, {"madrid": 555}, {"edinburgh": 214}],
    "2022-01-02": [{"edingburgh": 65}, {"london": 628}, {"madric": 143}],
}

Answer 2

Changing改变

survey_sum = defaultdict(dict)

to至

survey_sum = defaultdict(lambda: defaultdict(int))

allows the return of允许返回

defaultdict(<function survey_sum.<locals>.<lambda> at 0x100edd8b0>, {'2022-01-01': defaultdict(<class 'int'>, {'london': 134, 'madrid': 555, 'edinburgh': 214}), '2022-01-02': defaultdict(<class 'int'>, {'edingburgh': 65, 'london': 628, 'madrid': 143})})

Allowing iterating over to create a list.允许迭代以创建列表。

在 python 中获取没有 pandas 的 csv 列的总和

问题描述

2 个解决方案

解决方案1
1 2022-08-17 17:57:49

解决方案2
0 2022-08-17 19:37:43

在 python 中获取没有 pandas 的 csv 列的总和

问题描述

2 个解决方案

解决方案1 1 2022-08-17 17:57:49

解决方案2 0 2022-08-17 19:37:43

解决方案1
1 2022-08-17 17:57:49

解决方案2
0 2022-08-17 19:37:43