在python中使用pandas将csv文件附加到一个

Question

I have n files in a directory that I need to combine into one. 我在一个目录中有n个文件需要合并为一个。 They have the same amount of columns, for example, the contents of test1.csv are: 它们具有相同数量的列，例如， test1.csv的内容是：

test1,test1,test1  
test1,test1,test1  
test1,test1,test1

Similarly, the contents of test2.csv are: 同样， test2.csv的内容是：

test2,test2,test2  
test2,test2,test2  
test2,test2,test2

I want final.csv to look like this: 我希望final.csv看起来像这样：

test1,test1,test1  
test1,test1,test1  
test1,test1,test1  
test2,test2,test2  
test2,test2,test2  
test2,test2,test2

But instead it comes out like this: 但相反它出来是这样的：

test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2  
,,,test file 2,test file 2,test file 2  
,,,test file 2,test file 2,test file 2  
test file 1,test file 1,test file 1,,,  
test file 1,test file 1,test file 1,,,

Can someone help me figure out what is going on here? 有人可以帮我弄清楚这里发生了什么吗？ I have pasted my code below: 我在下面粘贴了我的代码：

import csv
import glob
import pandas as pd
import numpy as np 

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f) #create dataframe for reading current csv
    all_data = all_data.append(df) #appends current csv to final DF

all_data.to_csv("final.csv", index=None)

Answer 1

I think there are more problems: 我认为还有更多问题：

I removed import csv and import numpy as np , because in this demo they are not used (but maybe they in missing, lines so they can be imported) 我删除了import csv并import numpy as np ，因为在这个演示中它们没有被使用（但也许它们缺少，行可以导入它们）
I created list of all dataframes dfs , where dataframes are appended by dfs.append(df) . 我创建了所有数据帧dfs列表，其中数据帧由dfs.append(df)追加。 Then I used function concat for joining this list to final dataframe. 然后我使用函数concat将此列表加入到最终数据帧。
In function read_csv I added parameter header=None , because the main problem was that read_csv reads first row as header . 在函数read_csv我添加了参数header=None ，因为主要问题是read_csv将第一行读作header 。
In function to_csv I added parameter header=None for omitting header. 在函数to_csv我添加了参数header=None来省略标题。
I added folder test to final destination file, because if use function glob.glob("*.csv") you should read output file as input file. 我将文件夹test添加到最终目标文件，因为如果使用函数glob.glob("*.csv")您应该将输出文件作为输入文件读取。

Solution: 解：

import glob
import pandas as pd

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

#list of all df
dfs = []
for f in glob.glob("*.csv"): #for all csv files in pwd
    #add parameters to read_csv
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    #print df
    dfs.append(df) #appends current csv to final DF
all_data = pd.concat(dfs, ignore_index=True)
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2
all_data.to_csv("test/final.csv", index=None, header=None)

Next solution is similar. 下一个解决方案类似。
I add parameter header=None to read_csv and to_csv and add parameter ignore_index=True to append . 我将参数header=None添加到read_csv和to_csv并添加参数ignore_index=True以append 。

import glob
import pandas as pd

all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files

for f in glob.glob("*.csv"): #for all csv files in pwd
    df = pd.read_csv(f, header=None) #create dataframe for reading current csv
    all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF
print all_data
#       0      1      2
#0  test1  test1  test1
#1  test1  test1  test1
#2  test1  test1  test1
#3  test2  test2  test2
#4  test2  test2  test2
#5  test2  test2  test2

all_data.to_csv("test/final.csv", index=None, header=None)

Answer 2

You can concat . 你可以concat 。 Let df1 be your first dataframe and df2 the second, you can: 让df1成为您的第一个数据帧， df2成为第二个数据帧，您可以：

df = pd.concat([df1,df2],ignore_index=True)

The ignore_index is optional, you can set it to True if you don't mind the original indexes of the single dataframes. ignore_index是可选的，如果您不介意单个数据帧的原始索引，可以将其设置为True 。

Answer 3

pandas is not a tool to use when all you want is to create a single csv file, you can simply write each csv to a new file as you go: 当你想要的只是创建一个csv文件时， pandas不是一个可以使用的工具，你可以简单地将每个csv写入一个新文件：

import glob

with open("out.csv","w") as out:
    for fle in glob.glob("*.csv"):
        with open(fle) as f:
             out.writelines(f)

Or with the csv lib if you prefer: 或者如果您愿意，可以使用csv lib：

import glob
import csv

with open("out.csv", "w") as out:
    wr = csv.writer(out)
    for fle in glob.glob("*.csv"):
        with open(fle) as f:
            wr.writerows(csv.reader(f))

Creating a large dataframe just to eventually write to disk makes no real sense, furthermore if you had a lot of large files it may not even be possible. 创建一个大型数据帧只是为了最终写入磁盘没有任何意义，而且如果你有很多大文件，它甚至可能是不可能的。

在python中使用pandas将csv文件附加到一个

问题描述

3 个解决方案

解决方案1
5 已采纳 2015-12-12 21:33:23

解决方案2
2 2015-12-12 18:15:10

解决方案3
1 2015-12-12 18:36:55

在python中使用pandas将csv文件附加到一个

问题描述

3 个解决方案

解决方案1 5 已采纳 2015-12-12 21:33:23

解决方案2 2 2015-12-12 18:15:10

解决方案3 1 2015-12-12 18:36:55

解决方案1
5 已采纳 2015-12-12 21:33:23

解决方案2
2 2015-12-12 18:15:10

解决方案3
1 2015-12-12 18:36:55