简体   繁体   English

在Python中,如何找到CSV文件中信息的位置?

[英]In Python, how can I find the location of information in a CSV file?

I have three very long CSV files, and I need some advice/help with manipulating the code. 我有三个非常长的CSV文件,并且在处理代码时需要一些建议/帮助。 Basically, I want the program to be broad/basic enough where I can add any limitations and it'll work. 基本上,我希望程序足够广泛/基本,可以添加任何限制,并且可以正常运行。

For example, if I want to set the code to find where column 1==x and column 2 ==y, I want the code to also work if I want column 1!=r and column 2 例如,如果我想设置代码以查找列1 == x和列2 == y的位置,那么我想让代码也能在列1!= r和列2的情况下工作

import csv
file = input('csv files: ').split(',')
filters = input('Enter the filters: ').split(',')
f = open(csv_file,'r')
p=csv.reader(f)
header_eliminator = next(p,[])

I run into issues with the "file" part because if I choose to only use one file rather than the three I want to use now, it won't work. 我遇到了“文件”部分的问题,因为如果我选择仅使用一个文件而不是现在要使用的三个文件,它将无法正常工作。 Same goes for the filters. 过滤器也是如此。 The filters could be like 4==10,5>=4 过滤器可能像4 == 10,5> = 4

this means that column 4 of the file(s) would equal 10 and column 5 of the files would be greater than or equal to 4. However, I might also want the filters to look like this: 1==4.333, 5=="6/1/2014 0:00:00", 6<=60.0, 7!=6 这意味着文件的第4列等于10,文件的第5列等于或大于4。但是,我也可能希望过滤器看起来像这样:1 == 4.333,5 == “ 6/1/2014 0:00:00”,6 <= 60.0,7!= 6

So I want to be able to use it for other things! 因此,我希望能够将其用于其他用途! I'm having so much trouble with this, do you have any advice on how to get started? 我对此有很多麻烦,您对如何开始有任何建议吗? Thanks! 谢谢!

Pandas is excellent for dealing with csv files. 熊猫非常适合处理csv文件。 I'd recommend installing it. 我建议安装它。 pip install pandas

Then if you want to read open 3 csv files and do checks on the columns. 然后,如果您想读取打开的3个csv文件并在列上进行检查。 You'll just need to familiarize yourself with indexing in pandas. 您只需要熟悉熊猫索引 The only method you need to know for now, is .iloc since it seems you are indexing using the integer position of the columns. 您现在需要知道的唯一方法是.iloc因为似乎您正在使用列的整数位置建立索引。

import pandas as pd

files = input('Enter the csv files: ').split(',')
data = []
#keeping a list of the files allows us to input a different number of files
#we use pandas to read in each file into a pandas dataframe which is then     stored in an element of the list. The length of the list is the number of files.
for names in files:
    data.append(pd.read_csv(names)

#You can then perform checks like this to see if the column 2 of all files are equal to 3
print all(i.iloc[:,2] == 3 for i in data)

You can write an generator that will take a bunch of filenames and output the lines one by one and feed that in to csv.reader . 您可以编写一个生成器,该生成器将使用一堆文件名并csv.reader输出行,并将其输入到csv.reader The tricky part is the filter. 棘手的部分是过滤器。 If you let the filter be a single line of python code, then you can use eval for that part. 如果让过滤器成为一行python代码,则可以将eval用于该部分。 As an example 举个例子

import csv

#filenames = input('csv files: ').split(',')
#filters = input('Enter the filters: ').split(',')

# todo: for debug
# in this implementation, filters is a single python expression that can
# reference the 'col' variable which is a list of the current columns
filenames = 'a.csv,b.csv,c.csv'
filters = '"a" in col[0] and "2" in col[2]'

# todo: debug generate test files
for name in 'abc':
    with open('{}.csv'.format(name), 'w') as fp:
        fp.write('the header row\n')
        for row in range(3):
            fp.write(','.join('{}{}{}'.format(name, row, col) for col in range(3)) + '\n')

def header_squash(filenames):
    """Iterate multiple files line by line after squashing header line
    and any empty lines.
    """
    for filename in filenames:
        with open(filename) as fp:
            next(fp)
            for line in fp:
                if line.strip():
                    yield line

for col in csv.reader(header_squash(filenames.split(','))):
    # eval's namespace limits the damage untrusted code can do...
    if eval(filters, { 'col':col }):
        # passed the filter, do the work
        print(col)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM