简体   繁体   English

从一个目录加载所有csv / txt文件并通过python合并它们

[英]Load all csv/txt files from one directory and merge them via python

I have a folder which contains hundreds (possibly over 1 k) of csv data files, of chronological data. 我有一个文件夹,其中包含数百个(可能超过1 k)的csv数据文件,按时间顺序排列。 Ideally this data would be in one csv, so that I can analyse it all in one go. 理想情况下,这些数据将在一个csv中,以便我可以一次性分析它。 What I would like to know is, is there a way to append all the files to one another using python. 我想知道的是,有没有办法使用python将所有文件相互附加。

My files exist in folder locations like so: 我的文件存在于文件夹位置,如下所示:

C:\Users\folder\Database Files\1st September
C:\Users\folder\Database Files\1st October
C:\Users\folder\Database Files\1st November
C:\Users\folder\Database Files\1st December
etc

Inside each of the folders there is 3 csv (I am using the term csv loosly since these files are actually saved as .txt files containing values seperated by pipes | ) 在每个文件夹中有3个csv(我使用术语csv松散,因为这些文件实际上保存为包含由管道分隔的值的.txt文件|

Lets say these files are called: 让我们说这些文件被称为:

MonthNamOne.txt
MonthNamTwo.txt
MonthNameOneTwoMurged.txt

How would I, or even is it possible to code something to go through all of these folders in this directory and then merge together all the OneTwoMurged.txt files? 我怎么样,甚至可以编写一些东西来遍历这个目录中的所有这些文件夹,然后将所有OneTwoMurged.txt文件合并在一起?

For all files in folder with .csv suffix 对于.csv后缀的文件夹中的所有文件

import glob
import os

filelist = []

os.chdir("folderwithcsvs/")
for counter, files in enumerate(glob.glob("*.csv")):
    filelist.append(files)
    print "do stuff with file:", files, counter

print filelist

for fileitem in filelist:
    print fileitem

Obviously the "do stuff part" depends on what you want done with the files, this is looking getting your list of files. 显然,“做东西部分”取决于你想要对文件做什么,这看起来正在获取你的文件列表。

If you want to do something with the files on a monthly basis then you could use datetime and create possible months, same for days or yearly data. 如果您希望每月对文件执行某些操作,则可以使用日期时间并创建可能的月份,相同的日期或年度数据。

For example, for monthly files with the names Month Year.csv it would look for each file. 例如,对于名称为Month Year.csv月度文件,它将查找每个文件。

import subprocess, datetime, os

start_year, start_month = "2001", "January"

current_month = datetime.date.today().replace(day=1)
possible_month = datetime.datetime.strptime('%s %s' % (start_month, start_year), '%B %Y').date()
while possible_month <= current_month:
    csv_filename = possible_month.strftime('%B %Y') + '.csv'
    month = possible_month.strftime('%B %Y').split(" ")[0]
    year = possible_month.strftime('%B %Y').split(" ")[1]
    if os.path.exists("folder/" + csv_filename):
        print csv_filename
    possible_month = (possible_month + datetime.timedelta(days=31)).replace(day=1)

Obviously you can change that to however you feel fit, let me know if you need more or if this suffices. 显然你可以改变它,不管你觉得合适,如果你需要更多,或者如果这就足够了,请告诉我。

This will recursively process a directory, match a specific file pattern for processing, and append the results of processed files. 这将递归处理目录,匹配特定文件模式以进行处理,并附加已处理文件的结果。 This will parse the csvs as well, so you could do individual line analysis and processing as well. 这也将解析csvs,因此您也可以进行单独的行分析和处理。 Modify as needed :) 根据需要修改:)

#!python2
import os
import fnmatch
import csv
from datetime import datetime as dt

# Open result file
with open('output.txt','wb') as fout:
    wout = csv.writer(fout,delimiter='|')

    # Recursively process a directory
    for path,dirs,files in os.walk('files'):

        # Sort directories for processing.
        # In this case, sorting directories named "Month Year" chronologically.
        dirs.sort(key=lambda d: dt.strptime(d,'%B %Y'))
        interesting_files = fnmatch.filter(files,'*.txt')

        # Example for sorting filenames with a custom chronological sort "Month Year.txt"
        for filename in sorted(interesting_files,key=lambda f: dt.strptime(f,'%B %Y.txt')):

            # Generate the full path to the file.
            fullname = os.path.join(path,filename)
            print 'Processing',fullname

            # Open and process file
            with open(fullname,'rb') as fin:
                for line in csv.reader(fin,delimiter='|'):
                    wout.writerow(line)

Reading into pandas dataframe (choice of axis depends on your application), my example adds columns of same length 读入pandas数据帧(轴的选择取决于您的应用程序),我的示例添加了相同长度的列

import glob
import pandas as pd


df=pd.DataFrame()
for files in glob.glob("*.csv"):
    print files 
    df = pd.concat([df,pd.read_csv(files).iloc[:,1:]],axis=1)

axis = 0 would add row-wise axis = 0将按行添加

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas-将目录中的csv文件合并为一个 - Python pandas - merge csv files in directory into one 如果我有一个CSV文件的Python列表,如何将它们全部合并为一个巨型CSV文件? - If I have a Python list of CSV files, how do I merge them all into one giant CSV file? Python:将目录中的所有文件转换为一个 .TXT? - Python: Convert all files in directory into one .TXT? 在Python中合并多个.txt / csv文件 - Merge multiple .txt/csv files in Python 从目录中提取所有音频文件,然后将它们放到新文件中。 蟒蛇 - Extract all audio files from a directory and put them to a new one | python 循环遍历目录中的文件并合并python - Loop over files in a directory and merge them python Python脚本读取一个目录中的多个excel文件并将它们转换为另一个目录中的.csv文件 - Python script to read multiple excel files in one directory and convert them to .csv files in another directory 使用python从目录中的所有.txt文件中获取行 - Get rows from all .txt files in directory using python 将文件夹目录中的所有.csv文件复制到python中的一个文件夹 - Copy all .csv files in a directory of folders to one folder in python Python-从目录中所有csv文件中选择DataFrame列并合并为一个 - Python - Picking DataFrame columns from all csv files in directory and merging into one
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM