简体   繁体   English

Python - 在特定目录中连接CSV文件

[英]Python - Concatenate CSV files in a specific directory

I am trying to concatenate CSV files from a folder in my desktop: 我试图连接桌面上的文件夹中的CSV文件:

C:\\Users\\Vincentc\\Desktop\\W1 

and output the final CSV to: 并将最终的CSV输出到:

C:\\Users\\Vincentc\\Desktop\\W2\\conca.csv

The CSV files don't have header. CSV文件没有标题。 However, nothing come out when I run my script, and no error message. 但是,当我运行我的脚本时没有任何结果,也没有错误消息。 I'm a beginner, can someone have a look at my code below, Thanks a lot! 我是初学者,有人可以查看下面的代码,非常感谢!

import os
import glob
import pandas

def concatenate(indir="C:\\Users\\Vincentc\\Desktop\\W1",outfile="C:\\Users\\Vincentc\\Desktop\\W2\\conca.csv"):
    os.chdir(indir)
    fileList=glob.glob("indir")
    dfList=[]
    for filename in fileList:
        print(filename)
        df=pandas.read_csv(filename,header=None)
        dfList.append(df)
    concaDf=pandas.concat(dfList,axis=0)
    concaDf.to_csv(outfile,index=None)

Loading csv files into pandas only for concatenation purposes is inefficient. 将csv文件加载到pandas仅用于连接目的是低效的。 See this answer for a more direct alternative. 有关更直接的替代方案,请参阅此答案

If you insist on using pandas , the 3rd party library dask provides an intuitive interface: 如果你坚持使用pandas ,第三方库dask提供了一个直观的界面:

import dask.dataframe as dd

df = dd.read_csv('*.csv')  # read all csv files in directory lazily
df.compute().to_csv('out.csv', index=False)  # convert to pandas and save as csv

glob.glob() needs a wildcard to match all the files in the folder you have given. glob.glob()需要一个通配符来匹配您给出的文件夹中的所有文件。 Without it, you might just get the folder name returned, and none of the files inside it. 没有它,您可能只是获取返回的文件夹名称,而不是其中的任何文件。 Try the following: 请尝试以下方法:

import os
import glob
import pandas

def concatenate(indir=r"C:\Users\Vincentc\Desktop\W1\*", outfile=r"C:\Users\Vincentc\Desktop\W2\conca.csv"):
    os.chdir(indir)
    fileList = glob.glob(indir)
    dfList = []

    for filename in fileList:
        print(filename)
        df = pandas.read_csv(filename, header=None)
        dfList.append(df)

    concaDf = pandas.concat(dfList, axis=0)
    concaDf.to_csv(outfile, index=None)

Also you can avoid the need for adding \\\\ by either using / or by prefixing the strings with r . 此外,您可以通过使用/或通过在字符串前面添加r来避免添加\\\\的需要。 This has the effect of disabling the backslash escaping on the string. 这具有禁用字符串上的反斜杠转义的效果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM