简体   繁体   English

将多个csv文件导入pandas并合并为一个DataFrame

[英]Importing multiple csv files into pandas and merge them into one DataFrame

I have multiple csv files (Each file contains N number of Rows (eg, 1000 rows) and 43 Columns) .有多个 csv 文件(每个文件包含 N 行(例如,1000 行)和 43 列)

I would like to read several csv files from a folder into pandas and merge them into one DataFrame.想从一个文件夹中读取几个 csv 文件到 pandas 并将它们合并到一个 DataFrame 中。

I have not been able to figure it out though.一直无法弄清楚。

The problem is that, the final output of the DataFrame (ie, frame = pd.concat(li, axis=0, ignore_index=True) ) merge all columns (ie, 43 columns ) into one column (see the attached image) Screenshot of the code问题在于, DataFrame 的最终 output (即 frame = pd.concat frame = pd.concat(li, axis=0, ignore_index=True) )将所有列(即43 列)合并为一列(见附图)的代码

an example of selected rows and columns (file one)所选行和列的示例(文件一)

               Client_ID    Client_Name  Pointer_of_Bins   Date        Weight
                C0000001       POLYGONE      TI006093     12/03/2019   0.5
                C0000001       POLYGONE      TI006093     12/03/2019   0.6
                C0000001       POLYGONE      TI006093     12/03/2019   1.4
                C0000001       POLYGONE      TI006897     14/03/2019   2.9

an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/2019 1.9 an example of selected rows and columns (file two) Client_ID Client_Name Pointer_of_Bins Date Weight C0000001 POLYGONE TI006093 22/04/2019 1.5 C0000001 ALDI TI006098 22/04/2019 0.7 C0000001 ALDI TI006098 22/04/2019 2.4 C0000001 ALDI TI006898 24/04/ 2019 1.9

The expected outputs would look like this (merge of multiple files that might contains thousands of rows and several columns, as the attached data is just an example, while the actual csv files might contain thousands of rows and more than 45 columns in each file)预期的输出将如下所示(合并可能包含数千行和数列的多个文件,因为附加的数据只是一个示例,而实际的 csv 文件可能在每个文件中包含数千行和超过 45 列)

               Client_ID    Client_Name  Pointer_of_Bins   Date        Weight
                C0000001       POLYGONE      TI006093     12/03/2019   0.5
                C0000001       POLYGONE      TI006093     12/03/2019   0.6
                C0000001       POLYGONE      TI006093     12/03/2019   1.4
                C0000001       POLYGONE      TI006897     14/03/2019   2.9   
                C0000001       POLYGONE      TI006093     22/04/2019   1.5
                C0000001       ALDI          TI006098     22/04/2019   0.7
                C0000001       ALDI          TI006098     22/04/2019   2.4
                C0000001       ALDI          TI006898     24/04/2019   1.9                                                             

TO Download the two CSV files, click here (dummy data要下载两个 CSV 文件,请单击此处(虚拟数据

Here is what I have done so far:这是我到目前为止所做的:

import pandas as pd
import glob
path = r'C:\Users\alnaffakh\Desktop\doc\Data\data2\Test'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, sep='delimiter', index_col=None, header=0)
  # df = pd.read_csv(filename, sep='\t', index_col=None, header=0)
    li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)

Solution解决方案

You could use pandas.concat to recursively concatenate the .csv file contents.您可以使用pandas.concat递归连接.csv文件内容。
In fact, I see that you used it and your application of concat seems fine to me.事实上,我看到您使用了它,并且您的concat应用程序对我来说似乎很好 Try investigating the individual dataframes that you read.尝试调查您阅读的各个数据帧。 The only way your columns could merge into a single column is if you did not mention the correct delimiter.如果您没有提及正确的分隔符,您的列可以合并为单个列的唯一方法。

import pandas as pd

dfs = list()
for filename in filesnames:    
    df = pd.read_csv(filename)    
    dfs.append(df)
frame = pd.concat(dfs, axis=0, ignore_index=True)
df.head()

Example with Dummy Data虚拟数据示例

Since the dummy data available is not in text format yet, I am using just some dummy data I made.由于可用的虚拟数据还不是文本格式,我只使用我制作的一些虚拟数据。

import pandas as pd
from io import StringIO # needed for string to dataframe conversion

file1 = """
Col1    Col2    Col3    Col4    Col5
1   ABCDE   AE10    CD11    BC101F
2   GHJKL   GL20    JK22    HJ202M
3   MNPKU   MU30    PK33    NP303V
4   OPGHD   OD40    GH44    PG404E
5   BHZKL   BL50    ZK55    HZ505M
"""

file2 = """
Col1    Col2    Col3    Col4    Col5
1   AZYDE   AE10    CD11    BC100F
2   GUFKL   GL24    JK22    HJ207M
3   MHPRU   MU77    PK39    NP309V
4   OPGBB   OE90    GH41    PG405N
5   BHTGK   BL70    ZK53    HZ508Z
"""

Load data as individual dataframes and then concatenate them.将数据加载为单独的数据帧,然后将它们连接起来。

df1 = pd.read_csv(StringIO(file1), sep='\t')
df2 = pd.read_csv(StringIO(file2), sep='\t')
print(pd.concat([df1, df2], ignore_index=True))

Output : Output

   Col1   Col2  Col3  Col4    Col5
0     1  ABCDE  AE10  CD11  BC101F
1     2  GHJKL  GL20  JK22  HJ202M
2     3  MNPKU  MU30  PK33  NP303V
3     4  OPGHD  OD40  GH44  PG404E
4     5  BHZKL  BL50  ZK55  HZ505M
5     1  AZYDE  AE10  CD11  BC100F
6     2  GUFKL  GL24  JK22  HJ207M
7     3  MHPRU  MU77  PK39  NP309V
8     4  OPGBB  OE90  GH41  PG405N
9     5  BHTGK  BL70  ZK53  HZ508Z

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取多个 CSV 文件并将它们添加到 pandas dataframe - Reading multiple CSV Files and add them to pandas dataframe 如何在 pandas 中将多个 cvs 文件合并到一个 DataFrame 中? - How do I merge multiple cvs files into one DataFrame in pandas? 尝试加载多个json文件并合并到一个熊猫数据框中 - Trying to load multiple json files and merge into one pandas dataframe 将多个 excel 文件导入 python pandas 并拼接成一个 Z6A8064B5DF4794555500553C4DC7 - Import multiple excel files into python pandas and concatenate them into one dataframe python,pandas并将多个csv导入数据框 - python, pandas and importing multiple csv's into a dataframe 使用 Pandas 将多个 CSV 文件合并到一个数据框 - Merging multiple CSV files to one dataframe using Pandas 如何使用 pandas 导入多个 csv 文件并连接成一个 DataFrame - How to import multiple csv files and concatenate into one DataFrame using pandas 按创建日期过滤多个 csv 文件并连接成一个 pandas DataFrame - Filtering multiple csv files by creation date and concatenate into one pandas DataFrame 将多个CSV文件导入pandas并拼接成一个DataFrame - Import multiple CSV files into pandas and concatenate into one DataFrame 无法将多个 csv 文件导入到 Pandas 中并在 Python 中连接为一个 DataFrame - Failed to import multiple csv files into pandas and concatenate into one DataFrame in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM