[英]using pandas in python to append csv files into one
I have n files in a directory that I need to combine into one. 我在一个目录中有n个文件需要合并为一个。 They have the same amount of columns, for example, the contents of test1.csv
are: 它们具有相同数量的列,例如, test1.csv
的内容是:
test1,test1,test1
test1,test1,test1
test1,test1,test1
Similarly, the contents of test2.csv
are: 同样, test2.csv
的内容是:
test2,test2,test2
test2,test2,test2
test2,test2,test2
I want final.csv to look like this: 我希望final.csv看起来像这样:
test1,test1,test1
test1,test1,test1
test1,test1,test1
test2,test2,test2
test2,test2,test2
test2,test2,test2
But instead it comes out like this: 但相反它出来是这样的:
test file 1,test file 1.1,test file 1.2,test file 2,test file 2.1,test file 2.2
,,,test file 2,test file 2,test file 2
,,,test file 2,test file 2,test file 2
test file 1,test file 1,test file 1,,,
test file 1,test file 1,test file 1,,,
Can someone help me figure out what is going on here? 有人可以帮我弄清楚这里发生了什么吗? I have pasted my code below: 我在下面粘贴了我的代码:
import csv
import glob
import pandas as pd
import numpy as np
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
for f in glob.glob("*.csv"): #for all csv files in pwd
df = pd.read_csv(f) #create dataframe for reading current csv
all_data = all_data.append(df) #appends current csv to final DF
all_data.to_csv("final.csv", index=None)
I think there are more problems: 我认为还有更多问题:
import csv
and import numpy as np
, because in this demo they are not used (but maybe they in missing, lines so they can be imported) 我删除了import csv
并import numpy as np
,因为在这个演示中它们没有被使用(但也许它们缺少,行可以导入它们) dfs
, where dataframes are appended by dfs.append(df)
. 我创建了所有数据帧dfs
列表,其中数据帧由dfs.append(df)
追加。 Then I used function concat
for joining this list to final dataframe. 然后我使用函数concat
将此列表加入到最终数据帧。 read_csv
I added parameter header=None
, because the main problem was that read_csv
reads first row as header
. 在函数read_csv
我添加了参数header=None
,因为主要问题是read_csv
将第一行读作header
。 to_csv
I added parameter header=None
for omitting header. 在函数to_csv
我添加了参数header=None
来省略标题。 test
to final destination file, because if use function glob.glob("*.csv")
you should read output file as input file. 我将文件夹test
添加到最终目标文件,因为如果使用函数glob.glob("*.csv")
您应该将输出文件作为输入文件读取。 Solution: 解:
import glob
import pandas as pd
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
#list of all df
dfs = []
for f in glob.glob("*.csv"): #for all csv files in pwd
#add parameters to read_csv
df = pd.read_csv(f, header=None) #create dataframe for reading current csv
#print df
dfs.append(df) #appends current csv to final DF
all_data = pd.concat(dfs, ignore_index=True)
print all_data
# 0 1 2
#0 test1 test1 test1
#1 test1 test1 test1
#2 test1 test1 test1
#3 test2 test2 test2
#4 test2 test2 test2
#5 test2 test2 test2
all_data.to_csv("test/final.csv", index=None, header=None)
Next solution is similar. 下一个解决方案类似。
I add parameter header=None
to read_csv
and to_csv
and add parameter ignore_index=True
to append
. 我将参数header=None
添加到read_csv
和to_csv
并添加参数ignore_index=True
以append
。
import glob
import pandas as pd
all_data = pd.DataFrame() #initializes DF which will hold aggregated csv files
for f in glob.glob("*.csv"): #for all csv files in pwd
df = pd.read_csv(f, header=None) #create dataframe for reading current csv
all_data = all_data.append(df, ignore_index=True) #appends current csv to final DF
print all_data
# 0 1 2
#0 test1 test1 test1
#1 test1 test1 test1
#2 test1 test1 test1
#3 test2 test2 test2
#4 test2 test2 test2
#5 test2 test2 test2
all_data.to_csv("test/final.csv", index=None, header=None)
You can concat
. 你可以concat
。 Let df1
be your first dataframe and df2
the second, you can: 让df1
成为您的第一个数据帧, df2
成为第二个数据帧,您可以:
df = pd.concat([df1,df2],ignore_index=True)
The ignore_index
is optional, you can set it to True
if you don't mind the original indexes of the single dataframes. ignore_index
是可选的,如果您不介意单个数据帧的原始索引,可以将其设置为True
。
pandas
is not a tool to use when all you want is to create a single csv file, you can simply write each csv to a new file as you go: 当你想要的只是创建一个csv文件时, pandas
不是一个可以使用的工具,你可以简单地将每个csv写入一个新文件:
import glob
with open("out.csv","w") as out:
for fle in glob.glob("*.csv"):
with open(fle) as f:
out.writelines(f)
Or with the csv lib if you prefer: 或者如果您愿意,可以使用csv lib:
import glob
import csv
with open("out.csv", "w") as out:
wr = csv.writer(out)
for fle in glob.glob("*.csv"):
with open(fle) as f:
wr.writerows(csv.reader(f))
Creating a large dataframe just to eventually write to disk makes no real sense, furthermore if you had a lot of large files it may not even be possible. 创建一个大型数据帧只是为了最终写入磁盘没有任何意义,而且如果你有很多大文件,它甚至可能是不可能的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.