简体   繁体   中英

In Python, how can I find the location of information in a CSV file?

I have three very long CSV files, and I need some advice/help with manipulating the code. Basically, I want the program to be broad/basic enough where I can add any limitations and it'll work.

For example, if I want to set the code to find where column 1==x and column 2 ==y, I want the code to also work if I want column 1!=r and column 2

import csv
file = input('csv files: ').split(',')
filters = input('Enter the filters: ').split(',')
f = open(csv_file,'r')
p=csv.reader(f)
header_eliminator = next(p,[])

I run into issues with the "file" part because if I choose to only use one file rather than the three I want to use now, it won't work. Same goes for the filters. The filters could be like 4==10,5>=4

this means that column 4 of the file(s) would equal 10 and column 5 of the files would be greater than or equal to 4. However, I might also want the filters to look like this: 1==4.333, 5=="6/1/2014 0:00:00", 6<=60.0, 7!=6

So I want to be able to use it for other things! I'm having so much trouble with this, do you have any advice on how to get started? Thanks!

Pandas is excellent for dealing with csv files. I'd recommend installing it. pip install pandas

Then if you want to read open 3 csv files and do checks on the columns. You'll just need to familiarize yourself with indexing in pandas. The only method you need to know for now, is .iloc since it seems you are indexing using the integer position of the columns.

import pandas as pd

files = input('Enter the csv files: ').split(',')
data = []
#keeping a list of the files allows us to input a different number of files
#we use pandas to read in each file into a pandas dataframe which is then     stored in an element of the list. The length of the list is the number of files.
for names in files:
    data.append(pd.read_csv(names)

#You can then perform checks like this to see if the column 2 of all files are equal to 3
print all(i.iloc[:,2] == 3 for i in data)

You can write an generator that will take a bunch of filenames and output the lines one by one and feed that in to csv.reader . The tricky part is the filter. If you let the filter be a single line of python code, then you can use eval for that part. As an example

import csv

#filenames = input('csv files: ').split(',')
#filters = input('Enter the filters: ').split(',')

# todo: for debug
# in this implementation, filters is a single python expression that can
# reference the 'col' variable which is a list of the current columns
filenames = 'a.csv,b.csv,c.csv'
filters = '"a" in col[0] and "2" in col[2]'

# todo: debug generate test files
for name in 'abc':
    with open('{}.csv'.format(name), 'w') as fp:
        fp.write('the header row\n')
        for row in range(3):
            fp.write(','.join('{}{}{}'.format(name, row, col) for col in range(3)) + '\n')

def header_squash(filenames):
    """Iterate multiple files line by line after squashing header line
    and any empty lines.
    """
    for filename in filenames:
        with open(filename) as fp:
            next(fp)
            for line in fp:
                if line.strip():
                    yield line

for col in csv.reader(header_squash(filenames.split(','))):
    # eval's namespace limits the damage untrusted code can do...
    if eval(filters, { 'col':col }):
        # passed the filter, do the work
        print(col)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM