将多个csv文件导入pandas并合并为一个DataFrame

Question

I have multiple csv files (Each file contains N number of Rows (eg, 1000 rows) and 43 Columns) .我有多个 csv 文件（每个文件包含 N 行（例如，1000 行）和 43 列） 。

I would like to read several csv files from a folder into pandas and merge them into one DataFrame.我想从一个文件夹中读取几个 csv 文件到 pandas 并将它们合并到一个 DataFrame 中。

I have not been able to figure it out though.我一直无法弄清楚。

The problem is that, the final output of the DataFrame (ie, frame = pd.concat(li, axis=0, ignore_index=True) ) merge all columns (ie, 43 columns ) into one column (see the attached image) Screenshot of the code问题在于， DataFrame 的最终 output （即 frame = pd.concat frame = pd.concat(li, axis=0, ignore_index=True) ）将所有列（即43 列）合并为一列（见附图）的代码

an example of selected rows and columns (file one)所选行和列的示例（文件一）

               Client_ID    Client_Name  Pointer_of_Bins   Date        Weight
                C0000001       POLYGONE      TI006093     12/03/2019   0.5
                C0000001       POLYGONE      TI006093     12/03/2019   0.6
                C0000001       POLYGONE      TI006093     12/03/2019   1.4
                C0000001       POLYGONE      TI006897     14/03/2019   2.9

an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/2019 1.9 an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/ 2019 1.9

The expected outputs would look like this (merge of multiple files that might contains thousands of rows and several columns, as the attached data is just an example, while the actual csv files might contain thousands of rows and more than 45 columns in each file)预期的输出将如下所示（合并可能包含数千行和数列的多个文件，因为附加的数据只是一个示例，而实际的 csv 文件可能在每个文件中包含数千行和超过 45 列）

               Client_ID    Client_Name  Pointer_of_Bins   Date        Weight
                C0000001       POLYGONE      TI006093     12/03/2019   0.5
                C0000001       POLYGONE      TI006093     12/03/2019   0.6
                C0000001       POLYGONE      TI006093     12/03/2019   1.4
                C0000001       POLYGONE      TI006897     14/03/2019   2.9   
                C0000001       POLYGONE      TI006093     22/04/2019   1.5
                C0000001       ALDI          TI006098     22/04/2019   0.7
                C0000001       ALDI          TI006098     22/04/2019   2.4
                C0000001       ALDI          TI006898     24/04/2019   1.9

TO Download the two CSV files, click here (dummy data要下载两个 CSV 文件，请单击此处（虚拟数据

Here is what I have done so far:这是我到目前为止所做的：

import pandas as pd
import glob
path = r'C:\Users\alnaffakh\Desktop\doc\Data\data2\Test'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, sep='delimiter', index_col=None, header=0)
  # df = pd.read_csv(filename, sep='\t', index_col=None, header=0)
    li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

Answer 1

Solution解决方案

You could use pandas.concat to recursively concatenate the .csv file contents.您可以使用pandas.concat递归连接.csv文件内容。
In fact, I see that you used it and your application of concat seems fine to me.事实上，我看到您使用了它，并且您的concat应用程序对我来说似乎很好。 Try investigating the individual dataframes that you read.尝试调查您阅读的各个数据帧。 The only way your columns could merge into a single column is if you did not mention the correct delimiter.如果您没有提及正确的分隔符，您的列可以合并为单个列的唯一方法。

import pandas as pd

dfs = list()
for filename in filesnames:    
    df = pd.read_csv(filename)    
    dfs.append(df)
frame = pd.concat(dfs, axis=0, ignore_index=True)
df.head()

Example with Dummy Data虚拟数据示例

Since the dummy data available is not in text format yet, I am using just some dummy data I made.由于可用的虚拟数据还不是文本格式，我只使用我制作的一些虚拟数据。

import pandas as pd
from io import StringIO # needed for string to dataframe conversion

file1 = """
Col1    Col2    Col3    Col4    Col5
1   ABCDE   AE10    CD11    BC101F
2   GHJKL   GL20    JK22    HJ202M
3   MNPKU   MU30    PK33    NP303V
4   OPGHD   OD40    GH44    PG404E
5   BHZKL   BL50    ZK55    HZ505M
"""

file2 = """
Col1    Col2    Col3    Col4    Col5
1   AZYDE   AE10    CD11    BC100F
2   GUFKL   GL24    JK22    HJ207M
3   MHPRU   MU77    PK39    NP309V
4   OPGBB   OE90    GH41    PG405N
5   BHTGK   BL70    ZK53    HZ508Z
"""

Load data as individual dataframes and then concatenate them.将数据加载为单独的数据帧，然后将它们连接起来。

df1 = pd.read_csv(StringIO(file1), sep='\t')
df2 = pd.read_csv(StringIO(file2), sep='\t')
print(pd.concat([df1, df2], ignore_index=True))

Output : Output ：

   Col1   Col2  Col3  Col4    Col5
0     1  ABCDE  AE10  CD11  BC101F
1     2  GHJKL  GL20  JK22  HJ202M
2     3  MNPKU  MU30  PK33  NP303V
3     4  OPGHD  OD40  GH44  PG404E
4     5  BHZKL  BL50  ZK55  HZ505M
5     1  AZYDE  AE10  CD11  BC100F
6     2  GUFKL  GL24  JK22  HJ207M
7     3  MHPRU  MU77  PK39  NP309V
8     4  OPGBB  OE90  GH41  PG405N
9     5  BHTGK  BL70  ZK53  HZ508Z

将多个csv文件导入pandas并合并为一个DataFrame

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-10-07 17:29:27

Solution解决方案

Example with Dummy Data虚拟数据示例

将多个csv文件导入pandas并合并为一个DataFrame

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-10-07 17:29:27

Solution解决方案

Example with Dummy Data虚拟数据示例

解决方案1
2 已采纳 2019-10-07 17:29:27