使用 python 中的一个 csv 文件比较一列中的两个数据

Question

I am trying to compare two data in one csv file and I cannot use panda.我正在尝试比较一个 csv 文件中的两个数据，但我不能使用 panda。 What I am trying to get is the total Unit sold that the two person sell and the sum of all the years then compare who sold more based on the sum of all they sold through out the years.我想要得到的是两个人销售的总单位数和所有年份的总和，然后根据他们多年来销售的总和比较谁卖得更多。 Then also get the least they sold on that particular year.然后也得到他们在那一年卖出的最少的东西。

For example, my.csv is setup like this:例如，my.csv 是这样设置的：
John Smith, 343, 2020约翰·史密斯，343 岁，2020 年
John Smith, 522, 2019约翰·史密斯, 522, 2019
John Smith, 248, 2018约翰·史密斯，248 岁，2018 年
Sherwin Cooper, 412, 2020宣威·库珀，412 岁，2020 年
Sherwin Cooper, 367, 2019宣威库珀，367 岁，2019 年
Sherwin Cooper, 97, 2018宣威·库珀，97 岁，2018 年
Dorothy Lee, 612, 2020多萝西·李, 612, 2020
Dorothy Lee, 687, 2019桃乐丝·李, 687, 2019
Dorothy Lee, 591, 2018桃乐丝·李, 591, 2018

I want to compare John and Dorothy's unit sold and who sold more.我想比较 John 和 Dorothy 的单元销量以及谁卖出的更多。 So the output should be:所以 output 应该是：
Dorothy Lee sold more units than John smith.多萝西·李 (Dorothy Lee) 的销量比约翰·史密斯 (John smith) 多。 A total of 1890 to 1113.共计1890至1113人。
Dorothy Lee sold less in 2018, for only 591. Dorothy Lee 在 2018 年销量较低，仅售 591 件。
John Smith sold less in 2018, for only 248. John Smith 在 2018 年的销量有所下降，仅售 248 件。

My code so far is:到目前为止我的代码是：

import csv

def compare1(employee1):

    with open("employeedata.csv") as file:
    rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))

    res = {}

       for row in rows:
       if row['c1'] == employee1:
          res[employee1] = res.get(employee1, 0) + int(row['c2'])
        
       print(res)
        
def compare2(employee2):

   with open("employee2.csv") as file:
      rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))

   res = {}

   for row in rows:
      if row['c1'] == employee2:
         res[employee2] = res.get(employee2, 0) + int(row['c2'])
        
   print(res)

employee1 = input("Enter the first name: ")
employee2 = input("Enter the first name: ")


compare1(employee1)
compare2(employee2)

I don't know the rest. I am stuck.我不知道 rest。我卡住了。 I am a beginner and I can't use Panda.我是初学者，不会使用 Panda。 The output I need to have should look like this:我需要的 output 应该是这样的：

Dorothy Lee sold more units than John smith.多萝西·李 (Dorothy Lee) 的销量比约翰·史密斯 (John smith) 多。 A total of 1890 to 1113.共计1890至1113人。
Dorothy Lee sold less in 2018, for only 591. Dorothy Lee 在 2018 年销量较低，仅售 591 件。
John Smith sold less in 2018, for only 248. John Smith 在 2018 年的销量有所下降，仅售 248 件。
right now I got the output:现在我得到了 output：
{'John Smith: 1113} {'约翰·史密斯：1113}
{'Dorothy Lee': 1890} {'多萝西李'：1890}

Answer 1

Suppose my.csv has columns name, sales, year :假设 my.csv 有name, sales, year列：

import pandas as pd

emp_df = pd.read_csv("my.csv")

emp_gp = emp_df.groupby("name").sales.sum().reset_index(inplace=True)


def compare(saler1, saler2):
    if saler1 in emp_pg.name.values and saler2 in emp_pg.name.values:
         saler1_tol = emp_pg.loc[emp_pg.name == saler1, ["sales"]]
         saler2_tol = emp_pg.loc[emp_pg.name == saler2, ["sales"]]
         if saler1_tol > saler2_tol:
             print(f"{saler1} sold more unit than {saler2}. A total {saler1_tol} to {saler1_tol}")
         else:
             print(f"{saler2} sold more unit than {saler1}. A total {saler2_tol} to {saler2_tol}")
         emp_gb2 = emp_df.groupby("name")
         emp_agg = emp_gb2.agg({
              "sales" : "min"
         })
         emp_agg = emp_agg.reset_index()
         print("{saler1} sold less in {emp_pg.loc[emp_pg.name == saler1, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler1, ["sales"]].values}")
         print("{saler2} sold less in {emp_pg.loc[emp_pg.name == saler2, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler2, ["sales"]].values}")
    else:
        print("names of salers are not in the table")

Answer 2

Instead of creating a function for each result you want to get, first create a database (a dict is OK) that aggregates the sum of units sold for each name and for each year.不要为您想要获得的每个结果创建一个 function，而是首先创建一个数据库（一个字典就可以），该数据库汇总每个名称和每年的销售单位总和。 Then it is easier to answer to all kind of comparisons without having to repeat code.这样就可以更轻松地回答所有类型的比较，而无需重复代码。 You can start with something like this,你可以从这样的事情开始，

import csv
from collections import defaultdict

db=defaultdict(lambda: defaultdict(int))

with open('teste.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)     
    for row in reader:
        db[row['name']][int(row['year'])]+=int(row['units'])

print(db['Dorothy Lee'][2019]) #Units sold by Dorothy Lee in 2019
print(sum(db['Dorothy Lee'].values())) #Total Units sold by Dorothy Lee

Don't be afraid of the defaultdict module.不要害怕 defaultdict 模块。 Check the docs , it is really handy in this kind of scenario.检查文档，在这种情况下它真的很方便。 The defaultdict creates a dictionary with a default for every missing key. defaultdict 创建一个字典，每个缺失的键都有一个默认值。 In this case, the default value of the first defaultdict is another defaultdict, this time with a default value of 0 (the result of calling int()), since we want to compute a sum of units sold (therefore an integer).在这种情况下，第一个 defaultdict 的默认值是另一个 defaultdict，这次默认值为 0（调用 int() 的结果），因为我们要计算售出单位的总和（因此是一个整数）。 With this approach, you don't need to check if the key already exists or not, defaultdict takes care of that for you.使用这种方法，您无需检查密钥是否已存在，defaultdict 会为您处理。

PS: the lambda in the first defaultdict is needed to nest a second defaultdict. PS：第一个defaultdict中的lambda需要嵌套第二个defaultdict。 If you are not familiar with lambda either, check this如果您也不熟悉 lambda，请检查此

使用 python 中的一个 csv 文件比较一列中的两个数据

问题描述

2 个解决方案

解决方案1
0 2021-01-29 10:00:36

解决方案2
0 2021-01-29 10:00:44

使用 python 中的一个 csv 文件比较一列中的两个数据

问题描述

2 个解决方案

解决方案1 0 2021-01-29 10:00:36

解决方案2 0 2021-01-29 10:00:44

解决方案1
0 2021-01-29 10:00:36

解决方案2
0 2021-01-29 10:00:44