简体   繁体   English

在文件夹中的多个文件上运行python代码,并将它们写入单独的文件

[英]Running python code on multiple files in folder and writing them to separate files

I am working on a code to run a script on multiple files in a folder. 我正在研究在文件夹中的多个文件上运行脚本的代码。 I am able to run the code on each file however it is only saving to one output file then rewriting over that file. 我可以在每个文件上运行代码,但是它仅保存到一个输出文件,然后重写该文件。 How can I get this code to save the output to separate files? 如何获得此代码以将输出保存到单独的文件? Preferably with a similar name to each original file. 最好使用与每个原始文件相似的名称。 This is what I have thus far. 到目前为止,这就是我所拥有的。

import os, re
import pandas as pd
directory = os.listdir('C:/Users/user/Desktop/NOV')
os.chdir('C:/Users/user/Desktop/NOV')

for file in directory:
    df = pd.read_csv(file, index_col="DateTime", parse_dates=True)
    df = df.resample('1min').mean()
    df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
    df.to_csv("newfile.csv", na_rep='NaN')

Just change the file name in the last line in each iteration of the loop. 只需在循环的每次迭代的最后一行更改文件名。 Something like for i, file in enumerate(directory): and then df.to_csv("new_" + file + ".csv", na_rep='NaN') will do. 类似于for i, file in enumerate(directory):然后df.to_csv("new_" + file + ".csv", na_rep='NaN')df.to_csv("new_" + file + ".csv", na_rep='NaN')

Well, it obviously will always write to the same file because you are always giving the same file name in to_csv . 好吧,很显然它将始终写入同一文件,因为您总是在to_csv提供相同的文件名。 Use os.path.basename to create a new file name based on the old one without extension: 使用os.path.basename基于旧文件名创建一个新文件名,不带扩展名:

df.to_csv(os.path.basename(file) + "-processed.csv", na_rep='NaN')

My approach: 我的方法:

  • use glob.glob instead of os.listdir to filter out files which aren't csv files 使用glob.glob而不是os.listdir过滤掉不是csv文件的文件
  • don't perform a os.chdir , this is bad practice because other modules may not be aware that you changed the current directory, also changing dir twice as relative will fail, using glob.glob is nice to avoid that. 不要执行os.chdir ,这是一种不好的做法,因为其他模块可能不知道您更改了当前目录,并且两次更改了dir都会导致相对失败,因此使用glob.glob可以避免这种情况。
  • create a file with the same name but with "new_" prefix in the same directory (running twice will create "new_new_ file, though) 在相同目录中创建一个具有相同名称但前缀为"new_"的文件(运行两次将创建"new_new_文件"new_new_

code: 码:

import os, re, glob
import pandas as pd

input_dir = 'C:/Users/user/Desktop/NOV'

for file in glob.glob(os.path.join(input_dir,"*.csv")):
    df = pd.read_csv(file, index_col="DateTime", parse_dates=True)
    df = df.resample('1min').mean()
    df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq="1min"))
    new_filename = os.path.join(input_dir,"new_"+os.path_basename(file))
    df.to_csv(new_filename, na_rep='NaN')

The 'file' you've referenced in your for-loop should be the string of the file you are manipulating in your directory. 您在for循环中引用的“文件”应该是您在目录中操作的文件的字符串。

for file in directory:
    print file
    #oldfile.csv

You can use this to make a new file with a reference to the original. 您可以使用它来创建一个参考原始文件的新文件。 Something like this: 像这样:

for file in directory:
    df.to_csv("Output -" + file, na_rep='NaN') #make this the last line of your for-loop.
    #File will be called 'Output - oldfile.csv'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM