[英]Importing multiple csv files into pandas and merge them into one DataFrame
I have multiple csv files (Each file contains N number of Rows (eg, 1000 rows) and 43 Columns) .我有多个 csv 文件(每个文件包含 N 行(例如,1000 行)和 43 列) 。
I would like to read several csv files from a folder into pandas and merge them into one DataFrame.我想从一个文件夹中读取几个 csv 文件到 pandas 并将它们合并到一个 DataFrame 中。
I have not been able to figure it out though.我一直无法弄清楚。
The problem is that, the final output of the DataFrame (ie, frame = pd.concat(li, axis=0, ignore_index=True)
) merge all columns (ie, 43 columns ) into one column (see the attached image) Screenshot of the code问题在于, DataFrame 的最终 output (即 frame = pd.concat
frame = pd.concat(li, axis=0, ignore_index=True)
)将所有列(即43 列)合并为一列(见附图)的代码
an example of selected rows and columns (file one)所选行和列的示例(文件一)
Client_ID Client_Name Pointer_of_Bins Date Weight
C0000001 POLYGONE TI006093 12/03/2019 0.5
C0000001 POLYGONE TI006093 12/03/2019 0.6
C0000001 POLYGONE TI006093 12/03/2019 1.4
C0000001 POLYGONE TI006897 14/03/2019 2.9
an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/2019 1.9 an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/ 2019 1.9
The expected outputs would look like this (merge of multiple files that might contains thousands of rows and several columns, as the attached data is just an example, while the actual csv files might contain thousands of rows and more than 45 columns in each file)预期的输出将如下所示(合并可能包含数千行和数列的多个文件,因为附加的数据只是一个示例,而实际的 csv 文件可能在每个文件中包含数千行和超过 45 列)
Client_ID Client_Name Pointer_of_Bins Date Weight
C0000001 POLYGONE TI006093 12/03/2019 0.5
C0000001 POLYGONE TI006093 12/03/2019 0.6
C0000001 POLYGONE TI006093 12/03/2019 1.4
C0000001 POLYGONE TI006897 14/03/2019 2.9
C0000001 POLYGONE TI006093 22/04/2019 1.5
C0000001 ALDI TI006098 22/04/2019 0.7
C0000001 ALDI TI006098 22/04/2019 2.4
C0000001 ALDI TI006898 24/04/2019 1.9
TO Download the two CSV files, click here (dummy data要下载两个 CSV 文件,请单击此处(虚拟数据
Here is what I have done so far:这是我到目前为止所做的:
import pandas as pd
import glob
path = r'C:\Users\alnaffakh\Desktop\doc\Data\data2\Test'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, sep='delimiter', index_col=None, header=0)
# df = pd.read_csv(filename, sep='\t', index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
You could use pandas.concat
to recursively concatenate the .csv
file contents.您可以使用
pandas.concat
递归连接.csv
文件内容。
In fact, I see that you used it and your application of concat
seems fine to me.事实上,我看到您使用了它,并且您的
concat
应用程序对我来说似乎很好。 Try investigating the individual dataframes that you read.尝试调查您阅读的各个数据帧。 The only way your columns could merge into a single column is if you did not mention the correct delimiter.
如果您没有提及正确的分隔符,您的列可以合并为单个列的唯一方法。
import pandas as pd
dfs = list()
for filename in filesnames:
df = pd.read_csv(filename)
dfs.append(df)
frame = pd.concat(dfs, axis=0, ignore_index=True)
df.head()
Since the dummy data available is not in text format yet, I am using just some dummy data I made.由于可用的虚拟数据还不是文本格式,我只使用我制作的一些虚拟数据。
import pandas as pd
from io import StringIO # needed for string to dataframe conversion
file1 = """
Col1 Col2 Col3 Col4 Col5
1 ABCDE AE10 CD11 BC101F
2 GHJKL GL20 JK22 HJ202M
3 MNPKU MU30 PK33 NP303V
4 OPGHD OD40 GH44 PG404E
5 BHZKL BL50 ZK55 HZ505M
"""
file2 = """
Col1 Col2 Col3 Col4 Col5
1 AZYDE AE10 CD11 BC100F
2 GUFKL GL24 JK22 HJ207M
3 MHPRU MU77 PK39 NP309V
4 OPGBB OE90 GH41 PG405N
5 BHTGK BL70 ZK53 HZ508Z
"""
Load data as individual dataframes and then concatenate them.将数据加载为单独的数据帧,然后将它们连接起来。
df1 = pd.read_csv(StringIO(file1), sep='\t')
df2 = pd.read_csv(StringIO(file2), sep='\t')
print(pd.concat([df1, df2], ignore_index=True))
Output : Output :
Col1 Col2 Col3 Col4 Col5
0 1 ABCDE AE10 CD11 BC101F
1 2 GHJKL GL20 JK22 HJ202M
2 3 MNPKU MU30 PK33 NP303V
3 4 OPGHD OD40 GH44 PG404E
4 5 BHZKL BL50 ZK55 HZ505M
5 1 AZYDE AE10 CD11 BC100F
6 2 GUFKL GL24 JK22 HJ207M
7 3 MHPRU MU77 PK39 NP309V
8 4 OPGBB OE90 GH41 PG405N
9 5 BHTGK BL70 ZK53 HZ508Z
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.