简体   繁体   中英

How to loop through directory and edit every dataframe then save new files

I have a directory with >100 files and I want to edit all of them using python. I know how to do it on a single file but I can't figure out how to loop this. All the files are tab separated. After editing I would like to save the output to a new file each just like I do for one file.

For one file:

df1 = pd.read_csv('file1.txt',sep='\t')
df1['aminoacidscut'] = df1['aminoacids'].str[1:-1]
df1['aminoacidscut'] = df1["aminoacidscut"].str.replace("_","", regex=False)
df1['aminoacidscut'] = df1["aminoacidscut"].str.replace("*","", regex=False)

df2 = df1["aminoacidscut"]
df2.to_csv("file1-cut.txt", header=None, index=None, sep=' ', mode='a')

My attempt to loop but this does not work:

import os, re

directory = os.listdir('/output')
os.chdir('/output')

for file in directory:
    open_file = open(file,'r')
    read_file = open_file.read() 
    file['aminoacidscut'] = file['aminoacids'].str[1:-1]
    file['aminoacidscut'] = file["aminoacidscut"].str.replace("_","", regex=False)
    file['aminoacidscut'] = file["aminoacidscut"].str.replace("*","", regex=False)
    write_file = open(file,'w')
    write_file.write(read_file)

This should work:

import os
import glob
import pandas as pd

fmask = '/path/to/excel_files_dir/*.txt*'
target_dir = '/path/to/'
target_fname = '/path/to/result.txt'

dfs = []
for f in glob.glob(fmask):
    df = pd.read_csv(f, sep='\t')
    df.to_csv(os.path.join(target_dir, os.path.basename(f)),
                index=False)
    dfs.append(df)

# save concatenated
pd.concat(dfs, ignore_index=True).to_csv(target_fname, index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM