使用 python 中的一个 csv 文件比较一列中的两个数据

Question

我正在尝试比较一个 csv 文件中的两个数据，但我不能使用 panda。 我想要得到的是两个人销售的总单位数和所有年份的总和，然后根据他们多年来销售的总和比较谁卖得更多。 然后也得到他们在那一年卖出的最少的东西。

例如，my.csv 是这样设置的：
约翰·史密斯，343 岁，2020 年
约翰·史密斯, 522, 2019
约翰·史密斯，248 岁，2018 年
宣威·库珀，412 岁，2020 年
宣威库珀，367 岁，2019 年
宣威·库珀，97 岁，2018 年
多萝西·李, 612, 2020
桃乐丝·李, 687, 2019
桃乐丝·李, 591, 2018

我想比较 John 和 Dorothy 的单元销量以及谁卖出的更多。 所以 output 应该是：
多萝西·李 (Dorothy Lee) 的销量比约翰·史密斯 (John smith) 多。 共计1890至1113人。
Dorothy Lee 在 2018 年销量较低，仅售 591 件。
John Smith 在 2018 年的销量有所下降，仅售 248 件。

到目前为止我的代码是：

import csv

def compare1(employee1):

    with open("employeedata.csv") as file:
    rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))

    res = {}

       for row in rows:
       if row['c1'] == employee1:
          res[employee1] = res.get(employee1, 0) + int(row['c2'])
        
       print(res)
        
def compare2(employee2):

   with open("employee2.csv") as file:
      rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))

   res = {}

   for row in rows:
      if row['c1'] == employee2:
         res[employee2] = res.get(employee2, 0) + int(row['c2'])
        
   print(res)

employee1 = input("Enter the first name: ")
employee2 = input("Enter the first name: ")


compare1(employee1)
compare2(employee2)

我不知道 rest。我卡住了。 我是初学者，不会使用 Panda。 我需要的 output 应该是这样的：

多萝西·李 (Dorothy Lee) 的销量比约翰·史密斯 (John smith) 多。 共计1890至1113人。
Dorothy Lee 在 2018 年销量较低，仅售 591 件。
John Smith 在 2018 年的销量有所下降，仅售 248 件。
现在我得到了 output：
{'约翰·史密斯：1113}
{'多萝西李'：1890}

Answer 1

假设 my.csv 有name, sales, year列：

import pandas as pd

emp_df = pd.read_csv("my.csv")

emp_gp = emp_df.groupby("name").sales.sum().reset_index(inplace=True)


def compare(saler1, saler2):
    if saler1 in emp_pg.name.values and saler2 in emp_pg.name.values:
         saler1_tol = emp_pg.loc[emp_pg.name == saler1, ["sales"]]
         saler2_tol = emp_pg.loc[emp_pg.name == saler2, ["sales"]]
         if saler1_tol > saler2_tol:
             print(f"{saler1} sold more unit than {saler2}. A total {saler1_tol} to {saler1_tol}")
         else:
             print(f"{saler2} sold more unit than {saler1}. A total {saler2_tol} to {saler2_tol}")
         emp_gb2 = emp_df.groupby("name")
         emp_agg = emp_gb2.agg({
              "sales" : "min"
         })
         emp_agg = emp_agg.reset_index()
         print("{saler1} sold less in {emp_pg.loc[emp_pg.name == saler1, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler1, ["sales"]].values}")
         print("{saler2} sold less in {emp_pg.loc[emp_pg.name == saler2, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler2, ["sales"]].values}")
    else:
        print("names of salers are not in the table")

Answer 2

不要为您想要获得的每个结果创建一个 function，而是首先创建一个数据库（一个字典就可以），该数据库汇总每个名称和每年的销售单位总和。 这样就可以更轻松地回答所有类型的比较，而无需重复代码。 你可以从这样的事情开始，

import csv
from collections import defaultdict

db=defaultdict(lambda: defaultdict(int))

with open('teste.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)     
    for row in reader:
        db[row['name']][int(row['year'])]+=int(row['units'])

print(db['Dorothy Lee'][2019]) #Units sold by Dorothy Lee in 2019
print(sum(db['Dorothy Lee'].values())) #Total Units sold by Dorothy Lee

不要害怕 defaultdict 模块。 检查文档，在这种情况下它真的很方便。 defaultdict 创建一个字典，每个缺失的键都有一个默认值。 在这种情况下，第一个 defaultdict 的默认值是另一个 defaultdict，这次默认值为 0（调用 int() 的结果），因为我们要计算售出单位的总和（因此是一个整数）。 使用这种方法，您无需检查密钥是否已存在，defaultdict 会为您处理。

PS：第一个defaultdict中的lambda需要嵌套第二个defaultdict。 如果您也不熟悉 lambda，请检查此

使用 python 中的一个 csv 文件比较一列中的两个数据

问题描述

2 个解决方案

解决方案1
0 2021-01-29 10:00:36

解决方案2
0 2021-01-29 10:00:44

使用 python 中的一个 csv 文件比较一列中的两个数据

问题描述

2 个解决方案

解决方案1 0 2021-01-29 10:00:36

解决方案2 0 2021-01-29 10:00:44

解决方案1
0 2021-01-29 10:00:36

解决方案2
0 2021-01-29 10:00:44