简体   繁体   English

如何遍历.csv文件并提取python中的某些值?

[英]How to loop through .csv file and extract certain values in python?

I'm trying to loop through the 11th column in a CSV file and search for the term "abc" (as an example).我正在尝试遍历 CSV 文件中的第 11 列并搜索术语“abc”(例如)。 For every "abc" it finds, I want it to return the value of the first column of the same row, unless it's empty.对于它找到的每个“abc”,我希望它返回同一行的第一列的值,除非它是空的。 If it's empty, I want it to go up the first column row by row until it finds a cell that's not empty and return the value of that cell.如果它是空的,我希望它到 go 逐行上升到第一列,直到它找到一个不为空的单元格并返回该单元格的值。

I've already imported the needed CSV file.我已经导入了所需的 CSV 文件。 Here's my code trying to do the above.这是我尝试执行上述操作的代码。

for row in csvReader:
    if row[10] == 'abc':
        colAVal = row
        while colAVal[0] == '' and colAVal != 0:
            colAVal -= 1
        print(colAVal[0])

My question is does this code do what it's supposed to do?我的问题是这段代码是否做了它应该做的事情?

And for the second part of what I'm trying to do, I want to be able to manipulate the values that it returns - is there a way of storing these values so that that I can write code that does something for every colAVal[0] that the first part returned?对于我想要做的第二部分,我希望能够操纵它返回的值 - 有没有办法存储这些值,以便我可以编写代码来为每个 colAVal[0 ] 第一部分返回?

What you have there won't quite do what you want.你在那里拥有的东西不会完全满足你的需求。 Involking涉及

colAVal -= 1

does not give you the previous row in an iterator.不会为您提供迭代器中的上一行。 In languages with a more standard for loop, you could instead access the data you want by going backwards on the current iterator row until you found what you wanted, but in python this is not the recommended approach.在具有更标准 for 循环的语言中,您可以通过在当前迭代器行上倒退直到找到您想要的内容来访问您想要的数据,但在 python 中,这不是推荐的方法。 Python's for loop is more of a for each loop, and as such once you've gone from one row to the next, the previous is inaccessable without saving it or accessing it directly by row count on the input data object. Python 的 for 循环更像是一个 for each 循环,因此,一旦您从一行转到下一行,如果不保存或直接通过输入数据 object 上的行数访问它,就无法访问前一个。 Mixing these kinds of access is highly not recommended, and can get confusing fast.强烈不建议混合使用这些类型的访问权限,并且会很快让人感到困惑。

You also have two questions in you question above, and I'll try my best to answer both.您在上面的问题中也有两个问题,我会尽力回答这两个问题。

Given a dataset that looks like the following:给定一个如下所示的数据集:

col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12
0,0,0,0,0,0,0,0,0,0,abc,0
1,1,1,1,1,1,1,1,1,1,1,1
2,2,2,2,2,2,2,2,2,2,2,2
3,3,3,3,3,3,3,3,3,3,3,3
4,4,4,4,4,4,4,4,4,4,4,4
,5,5,5,5,5,5,5,5,5,abc,5
,6,6,6,6,6,6,6,6,6,abc,6
7,7,7,7,7,7,7,7,7,7,7,7

you would expect the answers to be 0, 4, and 4, if I'm understanding your question correctly.如果我正确理解您的问题,您会期望答案是 0、4 和 4。 You could accomplish that and save the data for later use with something like the following:您可以通过以下方式完成此操作并保存数据以供以后使用:

#! /usr/bin/env python

import csv

results = []

with open('example.csv') as file_handler:

    for row in csv.reader(file_handler):

        if row[0] != '' and row[0] != 0:

            lastValidFirstColumn = row[0]

        if row[10] == 'abc':

            results.append(lastValidFirstColumn)

print(results)
# prints ['0', '4', '4']

the data you want if I understood correctly is now stored in the results variable.如果我理解正确,您想要的数据现在存储在结果变量中。 Its not too difficult to write it to file or do other manipulations for it, and I'd recommend looking them up yourself, it'd be a better learning experience.将其写入文件或对其进行其他操作并不难,我建议您自己查找它们,这将是更好的学习体验。

You can do this in pandas pretty easily你可以很容易地在 pandas 中做到这一点

import pandas as pd
import numpy as np
df = pd.read_csv('my.csv', header=None)

Using a made up csv, we have these values使用组成的 csv,我们有这些值

    0       1   2   3   4   5   6   7   8   9   10
0   20.0    b   a   b   a   b   a   b   a   b   abc
1   NaN     c   d   c   d   c   d   c   d   c   def
2   10.0    d   e   d   e   d   e   d   e   d   ghi
3   NaN     e   f   e   f   e   f   e   f   e   abc

df['has_abc'] = np.where(df[10]=='abc', df.ffill()[0], np.nan)
df.dropna(subset=['has_abc'], inplace=True)

Output Output

    0       1   2   3   4   5   6   7   8   9   10  has_abc
0   20.0    b   a   b   a   b   a   b   a   b   abc 20.0
3   NaN     e   f   e   f   e   f   e   f   e   abc 10.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM