简体   繁体   English

使用 python 中的一个 csv 文件比较一列中的两个数据

[英]compare two data in a columns using one csv file in python

I am trying to compare two data in one csv file and I cannot use panda.我正在尝试比较一个 csv 文件中的两个数据,但我不能使用 panda。 What I am trying to get is the total Unit sold that the two person sell and the sum of all the years then compare who sold more based on the sum of all they sold through out the years.我想要得到的是两个人销售的总单位数和所有年份的总和,然后根据他们多年来销售的总和比较谁卖得更多。 Then also get the least they sold on that particular year.然后也得到他们在那一年卖出的最少的东西。

For example, my.csv is setup like this:例如,my.csv 是这样设置的:
John Smith, 343, 2020约翰·史密斯,343 岁,2020 年
John Smith, 522, 2019约翰·史密斯, 522, 2019
John Smith, 248, 2018约翰·史密斯,248 岁,2018 年
Sherwin Cooper, 412, 2020宣威·库珀,412 岁,2020 年
Sherwin Cooper, 367, 2019宣威库珀,367 岁,2019 年
Sherwin Cooper, 97, 2018宣威·库珀,97 岁,2018 年
Dorothy Lee, 612, 2020多萝西·李, 612, 2020
Dorothy Lee, 687, 2019桃乐丝·李, 687, 2019
Dorothy Lee, 591, 2018桃乐丝·李, 591, 2018

I want to compare John and Dorothy's unit sold and who sold more.我想比较 John 和 Dorothy 的单元销量以及谁卖出的更多。 So the output should be:所以 output 应该是:
Dorothy Lee sold more units than John smith.多萝西·李 (Dorothy Lee) 的销量比约翰·史密斯 (John smith) 多。 A total of 1890 to 1113.共计1890至1113人。
Dorothy Lee sold less in 2018, for only 591. Dorothy Lee 在 2018 年销量较低,仅售 591 件。
John Smith sold less in 2018, for only 248. John Smith 在 2018 年的销量有所下降,仅售 248 件。

My code so far is:到目前为止我的代码是:

import csv

def compare1(employee1):

    with open("employeedata.csv") as file:
    rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))

    res = {}

       for row in rows:
       if row['c1'] == employee1:
          res[employee1] = res.get(employee1, 0) + int(row['c2'])
        
       print(res)
        
def compare2(employee2):

   with open("employee2.csv") as file:
      rows = list(csv.DictReader(file, fieldnames = ['c1', 'c2', 'c3']))

   res = {}

   for row in rows:
      if row['c1'] == employee2:
         res[employee2] = res.get(employee2, 0) + int(row['c2'])
        
   print(res)

employee1 = input("Enter the first name: ")
employee2 = input("Enter the first name: ")


compare1(employee1)
compare2(employee2)

I don't know the rest. I am stuck.我不知道 rest。我卡住了。 I am a beginner and I can't use Panda.我是初学者,不会使用 Panda。 The output I need to have should look like this:我需要的 output 应该是这样的:

Dorothy Lee sold more units than John smith.多萝西·李 (Dorothy Lee) 的销量比约翰·史密斯 (John smith) 多。 A total of 1890 to 1113.共计1890至1113人。
Dorothy Lee sold less in 2018, for only 591. Dorothy Lee 在 2018 年销量较低,仅售 591 件。
John Smith sold less in 2018, for only 248. John Smith 在 2018 年的销量有所下降,仅售 248 件。
right now I got the output:现在我得到了 output:
{'John Smith: 1113} {'约翰·史密斯:1113}
{'Dorothy Lee': 1890} {'多萝西李':1890}

Suppose my.csv has columns name, sales, year :假设 my.csv 有name, sales, year列:

import pandas as pd

emp_df = pd.read_csv("my.csv")

emp_gp = emp_df.groupby("name").sales.sum().reset_index(inplace=True)


def compare(saler1, saler2):
    if saler1 in emp_pg.name.values and saler2 in emp_pg.name.values:
         saler1_tol = emp_pg.loc[emp_pg.name == saler1, ["sales"]]
         saler2_tol = emp_pg.loc[emp_pg.name == saler2, ["sales"]]
         if saler1_tol > saler2_tol:
             print(f"{saler1} sold more unit than {saler2}. A total {saler1_tol} to {saler1_tol}")
         else:
             print(f"{saler2} sold more unit than {saler1}. A total {saler2_tol} to {saler2_tol}")
         emp_gb2 = emp_df.groupby("name")
         emp_agg = emp_gb2.agg({
              "sales" : "min"
         })
         emp_agg = emp_agg.reset_index()
         print("{saler1} sold less in {emp_pg.loc[emp_pg.name == saler1, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler1, ["sales"]].values}")
         print("{saler2} sold less in {emp_pg.loc[emp_pg.name == saler2, ["year"]].values}, for only {emp_pg.loc[emp_pg.name == saler2, ["sales"]].values}")
    else:
        print("names of salers are not in the table")

Instead of creating a function for each result you want to get, first create a database (a dict is OK) that aggregates the sum of units sold for each name and for each year.不要为您想要获得的每个结果创建一个 function,而是首先创建一个数据库(一个字典就可以),该数据库汇总每个名称和每年的销售单位总和。 Then it is easier to answer to all kind of comparisons without having to repeat code.这样就可以更轻松地回答所有类型的比较,而无需重复代码。 You can start with something like this,你可以从这样的事情开始,

import csv
from collections import defaultdict

db=defaultdict(lambda: defaultdict(int))

with open('teste.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile)     
    for row in reader:
        db[row['name']][int(row['year'])]+=int(row['units'])

print(db['Dorothy Lee'][2019]) #Units sold by Dorothy Lee in 2019
print(sum(db['Dorothy Lee'].values())) #Total Units sold by Dorothy Lee

Don't be afraid of the defaultdict module.不要害怕 defaultdict 模块。 Check the docs , it is really handy in this kind of scenario.检查文档,在这种情况下它真的很方便。 The defaultdict creates a dictionary with a default for every missing key. defaultdict 创建一个字典,每个缺失的键都有一个默认值。 In this case, the default value of the first defaultdict is another defaultdict, this time with a default value of 0 (the result of calling int()), since we want to compute a sum of units sold (therefore an integer).在这种情况下,第一个 defaultdict 的默认值是另一个 defaultdict,这次默认值为 0(调用 int() 的结果),因为我们要计算售出单位的总和(因此是一个整数)。 With this approach, you don't need to check if the key already exists or not, defaultdict takes care of that for you.使用这种方法,您无需检查密钥是否已存在,defaultdict 会为您处理。

PS: the lambda in the first defaultdict is needed to nest a second defaultdict. PS:第一个defaultdict中的lambda需要嵌套第二个defaultdict。 If you are not familiar with lambda either, check this如果您也不熟悉 lambda,请检查

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 3.8 将一个 CSV 文件中的一列(向量)与另一个 CSV 文件中的两列(向量和数组)进行比较 - Compare one column (vector) from one CSV file with two columns (vector and array) from another CSV file using Python 3.8 使用Python比较CSV文件的列 - Compare columns of a CSV file using Python 使用Python比较两个CSV文件并基于比较结果更新一个CSV文件 - Compare two CSV file and update one CSV file based on compared result using Python 使用python或unix比较文件的两列 - Compare two columns of a file using python or unix 比较两个csv文件中的列并将其写入python中的另一个文件 - Compare columns in two csv files and write it to another file in python 比较相同 csv 文件中没有 header 的两列和使用 ZA7F5F354233B58682 的 output 匹配值。 - Compare two columns with no header of same csv file and output matching values using Python 3.8 Python比较两个csv文件并将数据附加到csv文件 - Python compare two csv files and append data to csv file 比较python中两个csv文件中的两列 - Compare two columns in two csv files in python 如何比较csv文件的三列并使用python确定丢失的数据? - How to compare three columns of a csv file and determine the missing data with python? 如何基于两个值比较数据(.csv)文件中的行,然后使用Python汇总数据? - How to compare lines in data (.csv) file based on two values, then roll-up the data using Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM