简体   繁体   English

使用Python和Pandas合并多个CSV文件

[英]Combine multiple CSV files using Python and Pandas

I have the following code: 我有以下代码:

import glob
import pandas as pd
allFiles = glob.glob("C:\*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    print file_
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
    frame = pd.concat(list_, sort=False)
print list_
frame.to_csv("C:\f.csv")

This combines multiple CSVs to single CSV. 它将多个CSV合并为单个CSV。

However it also adds a row number column. 但是,它还会添加一个行号列。

Input: 输入:

a.csv CSV

a   b   c   d
1   2   3   4

b.csv b.csv

a   b   c   d
551 55  55  55
551 55  55  55

result: f.csv 结果:f.csv

    a   b   c   d
0   1   2   3   4
0   551 55  55  55
1   551 55  55  55

How can I modify the code not to show the row numbers in the output file? 如何修改代码以不在输出文件中显示行号?

Change frame.to_csv("C:\\f.csv") to frame.to_csv("C:\\f.csv", index=False) frame.to_csv("C:\\f.csv")更改为frame.to_csv("C:\\f.csv", index=False)

See: pandas.DataFrame.to_csv 请参阅: pandas.DataFrame.to_csv

You don't have to use pandas for this simple task. 您不必使用熊猫来完成此简单任务。 pandas is parsing the file and converting the data to numpy constructs, which you don't need... In fact you can do it with just normal text file manipulation: pandas正在解析文件并将数据转换为numpy构造,您不需要...实际上,您可以通过普通的文本文件操作来做到这一点:

import glob
allFiles = glob.glob("C:\*.csv")
first = True
with open('C:\f.csv', 'w') as fw:
    for filename in allFiles:
        print filename
        with open(filename, 'r') as f:
            if not first:
                f.readline() # skip header
            first = False
            fw.writelines(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM