简体   繁体   English

从起始行合并多个csv

[英]Combine multiple csv from a starting row

I wanted to know if I can combine multiple CSV but to starting from a given row and add the name of the file on the first column. 我想知道是否可以合并多个CSV,但是要从给定的行开始并在第一列中添加文件名。 Currently, I have been using the following code: 目前,我一直在使用以下代码:

import os

import glob

import pandas as pd

os.chdir(Path)

extension = 'csv'

all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined.csv", index=False, encoding='utf-8-sig')

Now I am dealing with some files that are not well-formatted, and I need to start concatenating from row 3 and add the name of the file on the first column but not sure how to make that happen 现在,我正在处理一些格式不正确的文件,我需要从第3行开始进行连接,并在第一列中添加文件名,但不确定如何实现

(First of all you don't need the list comprehension with glob.glob - it returns a list anyway.) (首先,您不需要使用glob.glob进行列表理解-无论如何它都会返回一个列表。)

For your request to concatenate all files only from row 3 on - this should be fairly simple. 对于仅连接第3行起的所有文件的请求-这应该非常简单。 Just add a .loc[3:] to your read_csv command: 只需在您的read_csv命令中添加.loc[3:]

combined_csv = pd.concat([pd.read_csv(f).loc[3:] for f in all_filenames ])

But if you want to make more modifications, you should use a normal for loop, to get all things done in a readable way and concat everything afterwards, like: 但是,如果要进行更多修改,则应使用常规的for循环,以可读的方式完成所有操作并随后合并所有内容,例如:

mydfs = []
for f in all_filenames:
    df = pd.read_csv(f).loc[3:]
    df['filename'] = f
    mydfs.append(df)

combined_csv = pd.concat(mydfs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM