简体   繁体   中英

How can I read multiple csv file from different sub directories and find the csv file which has the value?

Let's say I have a root directory(folder) z and i have three sub-directory(folders) a, b, and c

Each a, b, and c contain one csv file which are similar data and have similar names a_data, b_data, and c_data)

Out of three csv files, only one csv contains the value of integer 100 inside data frame.``

How can I design a loop that scans all csv inside three sub-folders and tells me which csv has the value "100"?

Thanks alot!

import glob
import pandas as pd
val = 100
subdir_files = glob.glob(folder_path  + '/**/*.csv', recursive=True)
for file in subdir_files:
    df = pd.read_csv(file)
    if val in df['column_name'].values:
        print(file)
        break

I can't profile my idea at the moment, but I assume it is going to be faster to open each file with Pandas than try to search through the text of the CSV before opening it in Pandas. Also, it will probably read better.

So, under the assumption that its faster to open everything with Pandas than using something like the CSV library , let's do:

import pandas as pd
import numpy as np

df = pd.read_csv("~/z/a/a_data.csv")

if not df["column"].isin([100]).all():
  df = pd.read_csv("~/z/b/b_data.csv")

  if not df["column"].isin([100]).all():
    df = pd.read_csv("~/z/c/c_data.csv")

    if not df["column"].isin([100]).all():
      print("No value")

Ultimately, nested if's aren't pretty. But, it's hard to find what's the right fit without seeing your code. If you can post your code, that would help. Otherwise, hope the above helps you get started.

You can loop over your csv_files list like this, reading each using pandas.read_csv and finding the first one with the desired value. The else clause of the for loop will be executed if the loop ended normally (ie not on break ), corresponding to none of the files containing the desired value.

import pandas as pd
csv_files = ["a/a.csv", "b/b.csv", "c/c.csv"]
found_df = None
for csv_file in csv_files:
    df = pd.read_csv(csv_file)
    if 100 in df["column"].values:
        found_df = df
        break
else:
    print("No value found")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM