从起始行合并多个csv

Question

I wanted to know if I can combine multiple CSV but to starting from a given row and add the name of the file on the first column. 我想知道是否可以合并多个CSV，但是要从给定的行开始并在第一列中添加文件名。 Currently, I have been using the following code: 目前，我一直在使用以下代码：

import os

import glob

import pandas as pd

os.chdir(Path)

extension = 'csv'

all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined.csv", index=False, encoding='utf-8-sig')

Now I am dealing with some files that are not well-formatted, and I need to start concatenating from row 3 and add the name of the file on the first column but not sure how to make that happen 现在，我正在处理一些格式不正确的文件，我需要从第3行开始进行连接，并在第一列中添加文件名，但不确定如何实现

Answer 1

(First of all you don't need the list comprehension with glob.glob - it returns a list anyway.) （首先，您不需要使用glob.glob进行列表理解-无论如何它都会返回一个列表。）

For your request to concatenate all files only from row 3 on - this should be fairly simple. 对于仅连接第3行起的所有文件的请求-这应该非常简单。 Just add a .loc[3:] to your read_csv command: 只需在您的read_csv命令中添加.loc[3:] ：

combined_csv = pd.concat([pd.read_csv(f).loc[3:] for f in all_filenames ])

But if you want to make more modifications, you should use a normal for loop, to get all things done in a readable way and concat everything afterwards, like: 但是，如果要进行更多修改，则应使用常规的for循环，以可读的方式完成所有操作并随后合并所有内容，例如：

mydfs = []
for f in all_filenames:
    df = pd.read_csv(f).loc[3:]
    df['filename'] = f
    mydfs.append(df)

combined_csv = pd.concat(mydfs)

从起始行合并多个csv

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-09-11 20:20:34

从起始行合并多个csv

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-09-11 20:20:34

解决方案1
1 已采纳 2019-09-11 20:20:34