简体   繁体   English

Python CSV比较与乘法

[英]Python CSV Comparison with multiplication

I want to double-check my logic on how to put this together in Python, so examples are appreciated. 我想仔细检查一下如何在Python中将其组合在一起的逻辑,因此不胜感激。

I need to compare 2 CSV files (same format exactly with 2 rows and 6 columns) and provide the difference. 我需要比较2个CSV文件(具有2行和6列的相同格式),并提供区别。

I need to pull both in and multiply row 2, columns 2-6 by specific values (5), total them separately, and then compare with each other (CSV2 total/CSV1 total), and present in a percentage format. 我需要同时将第2行,第2-6列乘以特定值(5),分别进行总计,然后相互比较(CSV2总计/ CSV1总计),并以百分比格式显示。

import csv and reader seem like the way to go, but the tricky part for me has been pulling it into a list I can multiply against different values (or should I use a collection?), and then comparing the two in the most concise/efficient manner. 导入csv和阅读器似乎是一种方法,但是对我来说,棘手的部分是将其拉入一个列表中,我可以将其乘以不同的值(或者应该使用集合吗?),然后以最简洁的方式比较两者/有效的方式。

Code update (based on 2nd answer- was great, thanks! but now encountering error with calling my row values integers): 代码更新(基于第二个答案-太好了,谢谢!但是现在在调用我的行值整数时遇到错误):

import csv
file1 = open('csv1.csv', 'rb')
csv1 = csv.DictReader(file1)

file2 = open('csv2.csv', 'rb')
csv2 = csv.DictReader(file2)


myList = csv2.fieldnames
myList.append('Difference')

outFile = open('outFilename.csv', 'wb')
outCsv = csv.DictWriter(outFile, myList)

file1Dict = dict()
file2Dict = dict()

for row in file1:
    file1Dict[row['key value']]['Total1'] = {'Total1':(int(row[1]) * .75 + int(row[2]) * 2.25 + int(row[3]) * 3.5 + int(row[4]) * 5 + int(row[5]) * 25)}

for row in file2:
    file2Dict[row['key value']]['Total2'] = {'Total2':(int(row[1]) * .75, int(row[2]) * 2.25, int(row[3]) * 3.5, int(row[4]) * 5, int(row[5]) * 25)}

outFile.writeheader()

for stuff in file1Dict:
    file1Dict[stuff]['Difference'] = str(int(int(file1Dict[stuff]['Total2']) / int(file1Dict[stuff]['Total1'])) * 100) + '\%'
    outFile.writerow(file1Dict[stuff])

print 'difference'

I think you should use Python Pandas and the built-in read_csv function, since this will be very efficient and put it into a rectangular form where any kind of math operations are trivial to apply and easy to compare across two different imported data sets. 我认为您应该使用Python Pandas和内置的read_csv函数,因为这将非常高效,并且可以将其转换为矩形形式,在其中可以进行任何类型的数学运算,而且可以轻松地在两个不同的导入数据集之间进行比较。

Note that after importing pandas , there is a global level read_csv , called just as pandas.read_csv("/path/to/file.csv") rather than needing to go through io.parsers as in the linked doc page above. 请注意,在导入pandas ,有一个全局级别的read_csv ,称为pandas.read_csv("/path/to/file.csv")而不需要像上面的链接文档页面一样经过io.parsers

There's nothing wrong with doing this through standard modules; 通过标准模块执行此操作没有错; I just think if you plan to do aggregate or broadcasted math operations, the rectangular array provided by Pandas that uses the efficient NumPy math operations is the best choice. 我只是认为,如果您打算进行汇总或广播数学运算,那么由Pandas提供的使用高效NumPy数学运算的矩形数组是最佳选择。

import csv
file1 = open('filename1.csv', 'rb')
csv1 = csv.DictReader(file1)

file2 = open('filename2.csv', 'rb')
csv2 = csv.DictReader(file2)


myList = csv2.fieldnames
myList.append('Total1','Total2', 'Difference')

outFile = open('outFilename.csv', 'wb')
outCsv = csv.DictWriter(outFile, myList)

file1Dict = dict()
file2Dict = dict()

for rows in file1:
    file1Dict[rows['key value']] = {rows[0], rows[1], 'Total1':int(rows[1]) * 5}

for rows in file2:
    file1Dict[rows['key value']]['Total2'] = {'Total2':int(rows[1]) * 5}

outFile.writeheader()

for stuff in file1Dict:
    file1Dict[stuff]['Difference'] = str(int(int(file1Dict[stuff]['Total2']) / int(file1Dict[stuff]['Total1'])) * 100) + '\%'
    outFile.writerow(file1Dict[stuff])

Just a quick put together of what you described, without non standard modules. 只需将您描述的内容快速汇总即可,无需非标准模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM