简体   繁体   English

在 python 中获取没有 pandas 的 csv 列的总和

[英]Getting the sum of a csv column without pandas in python

I have a csv file passed into a function as a string:我有一个 csv 文件作为字符串传递到 function 中:

csv_input = """
            quiz_date,location,size
            2022-01-01,london_uk,134
            2022-01-02,edingburgh_uk,65
            2022-01-01,madrid_es,124
            2022-01-02,london_uk,125
            2022-01-01,edinburgh_uk,89
            2022-01-02,madric_es,143
            2022-01-02,london_uk,352
            2022-01-01,edinburgh_uk,125
            2022-01-01,madrid_es,431
            2022-01-02,london_uk,151"""

I want to print the sum of how many people were surveyed in each city by date, so something like:我想按日期打印每个城市接受调查的人数的总和,例如:

Date.         City.       Pop-Surveyed
2022-01-01.   London.     134
2022-01-01.   Edinburgh.  214
2022-01-01.   Madrid.     555
2022-01-02.   London.     628
2022-01-02.   Edinburgh.  65
2022-01-02.   Madrid.     143

As I can't import pandas on my machine (can't install without internet access) I thought I could use a defaultdict to store the value of each city by date由于我无法在我的机器上导入 pandas(无法在没有互联网访问的情况下安装),我想我可以使用 defaultdict 按日期存储每个城市的值

from collections import defaultdict

survery_data = csv_input.split()[1:]
survery_data = [survey.split(',') for survey in survery_data]

survey_sum = defaultdict(dict)

for survey in survery_data:
    date = survey[0]
    city = survey[1].split("_")[0]
    quantity = survey[-1]

    survey_sum[date][city] += quantity

print(survey_sum)

But doing this returns a KeyError:但是这样做会返回一个 KeyError:

KeyError: 'london'

When I was hoping to have a defaultdict of当我希望有一个 defaultdict

{'2022-01-01': {'london': 134}, {'edinburgh': 214}, {'madrid': 555}},
{'2022-01-02': {'london': 628}, {'edinburgh': 65}, {'madrid': 143}}

Is there a way to create a default dict that gives a structure so I could then iterate over to print out each column like above?有没有办法创建一个提供结构的默认字典,这样我就可以像上面一样迭代打印出每一列?

Try:尝试:

csv_input = """\
            quiz_date,location,size
            2022-01-01,london_uk,134
            2022-01-02,edingburgh_uk,65
            2022-01-01,madrid_es,124
            2022-01-02,london_uk,125
            2022-01-01,edinburgh_uk,89
            2022-01-02,madric_es,143
            2022-01-02,london_uk,352
            2022-01-01,edinburgh_uk,125
            2022-01-01,madrid_es,431
            2022-01-02,london_uk,151"""


header, *rows = (
    tuple(map(str.strip, line.split(",")))
    for line in map(str.strip, csv_input.splitlines())
)

tmp = {}
for date, city, size in rows:
    key = (date, city.split("_")[0])
    tmp[key] = tmp.get(key, 0) + int(size)

out = {}
for (date, city), size in tmp.items():
    out.setdefault(date, []).append({city: size})

print(out)

Prints:印刷:

{
    "2022-01-01": [{"london": 134}, {"madrid": 555}, {"edinburgh": 214}],
    "2022-01-02": [{"edingburgh": 65}, {"london": 628}, {"madric": 143}],
}

Changing改变

survey_sum = defaultdict(dict)

to

survey_sum = defaultdict(lambda: defaultdict(int))

allows the return of允许返回

defaultdict(<function survey_sum.<locals>.<lambda> at 0x100edd8b0>, {'2022-01-01': defaultdict(<class 'int'>, {'london': 134, 'madrid': 555, 'edinburgh': 214}), '2022-01-02': defaultdict(<class 'int'>, {'edingburgh': 65, 'london': 628, 'madrid': 143})})

Allowing iterating over to create a list.允许迭代以创建列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM