简体   繁体   中英

How to skip reading empty files with panda in Python

I read all the files in one folder one by one into a DataFrame and then I check them for some conditions. There are few thousand files, and I would love to make pandas raise an Exception when a file is empty, so that my reader funtion would skip this file.

I have something like:

class StructureReader(FileList):
    def __init__(self, dirname, filename):
        self.dirname=dirname
        self.filename=str(self.dirname+"/"+filename)
    def read(self):
        self.data = pd.read_csv(self.filename, header=None, sep = ",")
        if len(self.data)==0:
           raise ValueError
class Run(object):
    def __init__(self, dirname):
        self.dirname=dirname
        self.file__list=FileList(dirname)
        self.result=Result()
    def run(self):
        for k in self.file__list.file_list[:]:
            self.b=StructureReader(self.dirname, k)
            try:
                self.b.read()
                self.b.find_interesting_bonds(self.result)
                self.b.find_same_direction_chain(self.result)
            except ValueError:
                pass

Regular file that I'm searching for some condition looks like:

"A/C/24","A/G/14","WW_cis",,
"B/C/24","A/G/15","WW_cis",,
"C/C/24","A/F/11","WW_cis",,
"d/C/24","A/G/12","WW_cis",,

But somehow I don't ever get ValueError raised, and my functions are searching empty files, which gives me a lot of "Empty DataFrame ..." lines in my results file. How can I make program skip empty files?

I'd first check if the file is empty, and if it isn't empty I'll try to use it with pandas. Following this link https://stackoverflow.com/a/15924160/5088142 you can find a nice way to check if a file is empty:

import os
def is_non_zero_file(fpath):  
    return os.path.isfile(fpath) and os.path.getsize(fpath) > 0

You should not use pandas, but directly the python libraries. The answer is there: python how to check file empty or not

You can get your work done with following code, just add your CSVs path to the path variable, and run. You should get an object raw_data which is a Pandas dataframe.

import os, pandas as pd, glob
import pandas.io.common

path = "/home/username/data_folder"
files_list = glob.glob(os.path.join(path, "*.csv"))

for i in range(0,len(files_list)):
   try:
       raw_data = pd.read_csv(files_list[i])
   except pandas.io.common.EmptyDataError:
      print(files_list[i], " is empty and has been skipped.")

How about this

files = glob.glob('*.csv')
files = list(filter(lambda file: os.stat(file).st_size > 0, files))
data = pd.read_csv(files)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM