简体   繁体   English

Python:解析多个csv文件并跳过不带关键字的文件

[英]Python: Parsing multiple csv files and skip files without a keyword

I am trying to read some .csv field data on python for post-processing, I typically just use something like: 我正在尝试读取python上的某些.csv字段数据以进行后处理,我通常只使用以下内容:

for flist in glob('*.csv'):
    df = pd.read_csv(flist, delimiter = ',')

However I need to filter through the bad files which contain "Run_Terminated" somewhere in the file and skip the file entirely. 但是,我需要过滤掉文件中某处包含“ Run_Terminated”的错误文件,然后完全跳过该文件。 I'm still new to python so I'm not familiar with all of its functionalities, any input would be appreciated. 我还是python的新手,所以我不熟悉python的所有功能,任何输入都会受到赞赏。 Thank you. 谢谢。

What you could do is first read the file fully in memory (using a io.StringIO file-like object and look for the Run_Terminated string anywhere in the file (dirty, but should be OK), 您可以做的是首先完全读取内存中的文件(使用io.StringIO文件的对象,并在文件中的任何位置查找Run_Terminated字符串(脏,但应该可以),

Then pass the handle to read_csv (since you can pass a handle OR a filename) so you don't have to read it again from the file. 然后将句柄传递给read_csv (因为您可以传递句柄或文件名),因此您不必从文件中再次读取它。

import pandas as pd
import glob
import io

for flist in glob('*.csv'):
    with open(flist) as f:
        data = io.StringIO()
        data.write(f.read())
    if "Run_Terminated" not in data.getvalue():
        data.seek(0)  # rewind or it won't read anything
        df = pd.read_csv(data, delimiter = ',')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM