简体   繁体   English

Python - 循环遍历csv文件的行值

[英]Python - loop through a csv file row values

I have a csv file like the following (test.csv) with two columns. 我有一个csv文件,如下面的(test.csv)有两列。

338,800
338,550
339,670
340,600 
327,500
301,430
299,350
284,339
284,338
283,335
283,330
283,310
282,310
282,300
282,300
283,290

From column 1, I wanted to read current row and compare it with the value of the previous row. 从第1列开始,我想读取当前行并将其与前一行的值进行比较。 If it is greater OR equal , continue comparing and if the value of the current cell is smaller than the previous row - then i wanted to the value of the second column in the same row. 如果它greaterequal ,继续比较,如果当前单元格的值smaller前一行 - 那么我想要同一行中第二列的值。

Next I wanted to divided the larger value we got in column 1 by the value in the same cell of column two. 接下来,我想将第1列中的larger value除以第2列的同一单元格中的值。 Let me make it clear. 让我说清楚。

For example in the table given above: the smaller value we will get depending on my requirement from Column 1 is 327 (because 327 is smaller than the previous value 340) - and then we take 500 (which is the corresponding cell value on column 2). 例如,在上面给出的表格中:根据我对第1列的要求,我们得到的值越小,为327(因为327小于之前的值340) - 然后我们取500(这是第2列的相应单元格值) )。 Finally we divide 340 by 500 and get the value 0.68 . 最后我们将340除以500得到值0.68 My python script should exit right after we print the value to the console. 在我们将值打印到控制台后,我的python脚本应该立即退出。

Currently, I am using the following script in bash, and it works fine 目前,我在bash中使用以下脚本,它工作正常

awk -F, '$1<p && $2!=0{ 
val=$2/p    
if(val>=0.8 && val<=0.9)
    {
        print "A"
    }
else if(val==0.7)
    {
        print "B"
    }
else if(val>=0.5 && val <0.7)
    {
        print "C" 

    }
else if(val==0.5)
    {
        print "E"
    }
else
    {
        print "D" 
    }
exit
}
{ 
    p=$1 
}' test.csv

but I wanted to do it with python and i would appreciate for any help. 但是我想用python做这件事,我希望得到任何帮助。 Here is my approach 这是我的方法

import csv

f = open("test.csv", "r+")
ff = csv.reader(f)

previous_line = ff.next()
while(True):
    try:
        current_line = ff.next()
        if previous_line <= current_line:
            print "smaller value"
    except StopIteration:
        break

I recommend you use csv.Reader 's built-in iteration rather than calling .next() directly. 我建议你使用csv.Reader的内置迭代,而不是直接调用.next() And your code should not test normal floats for equality. 并且您的代码不应该测试普通浮点数是否相等。 In any language, that's not just a Python thing. 在任何语言中,这不仅仅是Python的东西。 Also, a calculated value of 0.79 will result in D which may not be what you intend. 此外,计算值0.79将导致D可能不是您想要的。

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    #Note: don't test val == 0.7: you should never test floats for equality
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

with open(r"E:\...\test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result = previous_value / int(col2)
            print category(result)
            break

Edit in response to a change to the OP's request 编辑以响应对OP请求的更改

from __future__ import division
import csv

def category(val):
    if 0.8 < val <= 0.9:
        return "A"
    #Note: don't test val == 0.7: you should never test floats for equality
    if abs(val - 0.7) < 1e-10:
        return "B"
    if 0.5 < val < 0.7:
        return "C"
    if abs(val - 0.5) < 1e-10:
        return "E"
    return "D"

with open(r"E:\...\test.csv", "r") as csvfile:
    ff = csv.reader(csvfile)

    results = []
    previous_value = 0
    for col1, col2 in ff:
        if not col1.isdigit():
            continue
        value = int(col1)
        if value >= previous_value:
            previous_value = value
            continue
        else:
            result = previous_value / int(col2)
            results.append(result)
            print category(result)
            previous_value = value
    print (results)
    print (sum(results))
    print (category(sum(results) / len(results)))

I've had to guess at the logic you want to use for resetting the previous value because your original had the loop break at the first result. 我不得不猜测你想要用来重置前一个值的逻辑,因为你的原语在第一个结果处有循环中断。 And I've no idea how you want end-of-file handled. 我不知道你希望如何处理文件结束。 This revision produces the following output: 此修订版产生以下输出:

C
D
A
A
A
D
[0.68, 0.7604651162790698, 0.86, 0.8820058997050148, 0.8477611940298507, 0.9129032258064517]
4.94313543582
A

As you can see, there are definitely more than two values in results . 如您所见, results中肯定有两个以上的值。

col_1 = []
col_2 = []
with open("test.csv", "r+") as f:
    for elem in f.readlines():
        col_1.append(float(elem.split(",")[0]))
        col_2.append(float(elem.split(",")[1]))

condition = True
i=0
while condition:
    if (col_1[i+1]-col_1[i]<0):
        print col_1[i]/col_2[i+1]
        condition = False
    i+=1

If it is a .csv file working with pandas could give you more control. 如果它是一个使用pandas的.csv文件可以给你更多的控制权。

import numpy as np
import pandas as pd

pd.read_csv("filename.csv") # to read a .csv file into a dataframe

However, for this example I am not using pd.read_csv() function. 但是,对于此示例,我不使用pd.read_csv()函数。 Instead, I am creating a dataframe from a 2D numpy array like so, 相反,我正在从2D numpy数组创建一个数据帧,如下所示,

dataframe = pd.DataFrame(np.array([[338,800],
    [338,550],
    [339,670],
    [340,600], 
    [327,500],
    [301,430],
    [299,350],
    [284,339],
    [284,338],
    [283,335],
    [283,330],
    [283,310],
    [282,310],
    [282,300],
    [282,300],
    [283,290]]))

Now that I have a dataframe object, I can manipulate it just like other object types in python. 现在我有了一个dataframe对象,我可以像在python中的其他对象类型一样操作它。 I can call pandas specific functions to work on the dataframe for the results I want. 我可以调用pandas特定的函数来处理我想要的结果的数据帧。

def compare_and_divide(df):
    for i in range(len(df)-1):
        # df[0] for all values in col 0 .iloc[i] for value in row
        if df[0].iloc[i+1] >= df[0].iloc[i]:                                 
            continue     
        else:
            df[0].iloc[i+1] = df[0].iloc[i]

    return df[0].div(df[1]) # .div() function to divide values in col 0 by col 1

compare_and_divide(dataframe)   

0     0.422500
1     0.614545
2     0.505970
3     0.566667
4     0.680000 # 340/500 value mentioned in the question
5     0.790698
6     0.971429
7     1.002950
8     1.005917
9     1.014925
10    1.030303
11    1.096774
12    1.096774
13    1.133333
14    1.133333
15    1.172414
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM