I have a problem with a loop in Python. My folder looks like this:
|folder_initial
|--data_loop
|--example1
|--example2
|--example3
|--python_jupyter_notebook
I would like to loop through all files in data_loop, open them, run a simple operation, save them with another name and then do the same with the subsequent file. I have created the following code:
import pandas as pd
import numpy as np
import os
def scan_folder(parent):
# iterate over all the files in directory 'parent'
for file_name in os.listdir(parent):
if file_name.endswith(".csv"):
print(file_name)
df = pd.read_csv("RMB_IT.csv", low_memory=False, header=None, names=['column1','column2','column3','column4']
df = df[['column2','column4']
#Substitute ND with missing data
df = df.replace('ND,1',np.nan)
df = df.replace('ND,2',np.nan)
df = df.replace('ND,3',np.nan)
df = df.replace('ND,4',np.nan)
df = df.replace('ND,5',np.nan)
df = df.replace('ND,6',np.nan)
else:
current_path = "".join((parent, "/", file_name))
if os.path.isdir(current_path):
# if we're checking a sub-directory, recall this method
scan_folder(current_path)
scan_folder("./data_loop") # Insert parent direcotry's path
I get the error:
FileNotFoundError
FileNotFoundError: File b'example2.csv' does not exist
Moreover, I would like to run the code without the necessity of having the Jupyter notebook in the folder folder_initial but I would like to have something like this:
|scripts
|--Jupiter Notebook
|data
|---csv files
|--example1.csv
|--example2.csv
Any idea?
-- Edit: I create something like this on user suggestion
import os
import glob
os.chdir('C:/Users/bedinan/Documents/python_scripts_v02/data_loop')
for file in list(glob.glob('*.csv')):
df = pd.read_csv(file, low_memory=False, header=None, names=[
df = df[[
#Substitute ND with missing data
df = df.replace('ND,1',np.nan)
df = df.replace('ND,2',np.nan)
df = df.replace('ND,3',np.nan)
df = df.replace('ND,4',np.nan)
df = df.replace('ND,5',np.nan)
df = df.replace('ND,6',np.nan)
df.to_pickle(file+"_v02"+".pkl")
f = pd.read_pickle('folder\\data_loop\\RMB_PT.csv_v02.pkl')
But the name of the file that results is not properly composed since it has inside the name the extension -csv
You can use this answer to iterate over all subfolders:
import os
import shutil
import pathlib
import pandas as pd
def scan_folder(root):
for path, subdirs, files in os.walk(root):
for name in files:
if name.endswith('.csv'):
src = pathlib.PurePath(path, name)
dst = pathlib.PurePath(path, 'new_' + name)
shutil.copyfile(src, dst)
df = pd.read_csv(dst)
# do something with DF
df.to_csv()
scan_folder(r'C:\User\Desktop\so\55648849')
Here's a solution which only uses pathlib
, which I'm quite a big fan of. I pulled out your DataFrame operations into their own function, which you can re-name and re-write to actually do what you want it to do.
import pandas as pd
import numpy as np
from pathlib import Path
# rename the function to something more relevant
def df_operation(csv_path):
df = pd.read_csv(
csv_path.absolute(),
low_memory=False,
header=None,
names=['column1','column2','column3','column4']
)
# do some stuff with the dataframe
def scan_folder(parent):
p = Path(parent)
# Probably want a check here to make sure the provided
# parent is a directory, not a file
assert p.is_dir()
[df_operation(f) for f in p.rglob('*') if f.suffix == '.csv']
print(scan_folder("./example/dir"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.