简体   繁体   English

借助 jupyter notebook 将多个 csv 文件合并为一个新的 csv 文件

[英]Merge multiple csv files into a new one csv file with the help of jupyter notebook

In fact, I have a problem with merging the csv files using python jupyter notebook.事实上,我在使用 python jupyter notebook 合并 csv 文件时遇到了问题。 I wrote the below code, however, I still have problems, as the columns are not on the same level, the second column starts from the end of the first column, and so on.我写了下面的代码,但是,我仍然有问题,因为列不在同一级别,第二列从第一列的末尾开始,依此类推。 The column contents in different csv files are as follows: timestamp,load energy data, lighting data, operative data, please your help.不同csv文件中的列内容如下:时间戳,负载能量数据,照明数据,运行数据,请大家帮忙。

path = "C:/Users"

file_list = glob.glob(path + "/*.csv")
print('File names:', file_list)
 
csv_list = []

for file in file_list:
    csv_list.append(pd.read_csv(file))
csv_merged = pd.DataFrame()
 
for csv_file in csv_list:
    csv_merged = csv_merged.append(csv_file, ignore_index=True)
    
csv_merged.to_csv('C:/Users.csv',index=False)

Can I add more details into this code, such as names of columns, as well as exclude some columns, if possible, please let me know how I can do it.我可以在此代码中添加更多详细信息,例如列名,以及排除某些列,如果可能,请告诉我该怎么做。

try pandas.merge function instead of using list for example:尝试pandas.merge函数而不是使用 list 例如:

import pandas as pd
path = "C:/Users"
file_list = glob.glob(path + "/*.csv")
print('File names:', file_list)

# merge data
data_frame = pd.read_csv(path + file_list[0])

for file in file_list:
    if file == file_list[0]:
        continue
    df_to_merge = pd.read_csv(path + file)
    data_frame.merge(df_to_merge)

data_frame.to_csv('C:/merge.csv') 

As Krishna mentions, it's not clear what's wrong with your code.正如 Krishna 所提到的,目前尚不清楚您的代码有什么问题。 Example files would have helped to better understand the issue.示例文件将有助于更好地理解该问题。

However, using append in a for loop for dataframes is inefficient.但是,在数据帧的 for 循环中使用 append 效率很低。 It's better to use pd.concat as follows.最好如下使用 pd.concat 。

Code代码

path = "C:/Users"

file_list = glob.glob(path + "/*.csv")
print('File names:', file_list)

pd.concat(map(pd.read_csv, file_list), 
          ignore_index=True).to_csv('C:/Users.csv',index=False)

Explanation:解释:

We create the merged dataframes with:我们使用以下方法创建合并的数据框:

pd.concat(map(pd.read_csv, file_list), 
              ignore_index=True)

Create the output CSV file with:使用以下命令创建输出 CSV 文件:

to_csv('C:/Users.csv',index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM