简体   繁体   中英

Iterate through folders and find a file to put into a dataframe

I have a directory ../customer_data/* with 15 folders. Each folder is a unique customer.

Example: ../customer_data/customer_1

Within each customer folder there is a csv called surveys.csv .

GOAL: I want to iterate through all the folders in ../customer_data/* and find the surveys.csv for each unique customer and create a concatenated dataframe. I also want to add a column in the dataframe where it has the customer id which is the name of the folder.

import glob
import os
rootdir = '../customer_data/*'
dataframes = []
for subdir, dirs, files in os.walk(rootdir):
    
    for file in files:
        csvfiles = glob.glob(os.path.join(rootdir, 'surveys.csv'))
        
        # loop through the files and read them in with pandas
         # a list to hold all the individual pandas DataFrames
      
        df = pd.read_csv(csvfiles)
        df['customer_id'] = os.path.dirname
        dataframes.append(df)
            
# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)
result.head()

This code is not giving me all 15 files. Please help

You can use the pathlib module for this.

from pathlib import Path
import pandas as pd

dfs = []
for filepath in Path("customer_data").glob("customer_*/surveys.csv"):
    this_df = pd.read_csv(filepath)
    # Set the customer ID as the name of the parent directory.
    this_df.loc[:, "customer_id"] = filepath.parent.name
    dfs.append(this_df)

df = pd.concat(dfs)

Let's try pathlib with rglob which will recursively search your directory structure for all files that match a glob pattern. in this instance survey.

import pandas as pd 
from pathlib import Path

root_dir = Path('/top_level_dir/')

files = {file.parent.parts[-1] : file  for file in Path.rglob('*survey.csv')}

df = pd.concat([pd.read_csv(file).assign(customer=name) for name,file in files.items()])

Note you'll need Python 3.4+ for pathlib.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM