[英]Combine multiple csv from a starting row
I wanted to know if I can combine multiple CSV but to starting from a given row and add the name of the file on the first column. 我想知道是否可以合并多个CSV,但是要从给定的行开始并在第一列中添加文件名。 Currently, I have been using the following code:
目前,我一直在使用以下代码:
import os
import glob
import pandas as pd
os.chdir(Path)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "combined.csv", index=False, encoding='utf-8-sig')
Now I am dealing with some files that are not well-formatted, and I need to start concatenating from row 3 and add the name of the file on the first column but not sure how to make that happen 现在,我正在处理一些格式不正确的文件,我需要从第3行开始进行连接,并在第一列中添加文件名,但不确定如何实现
(First of all you don't need the list comprehension with glob.glob
- it returns a list anyway.) (首先,您不需要使用
glob.glob
进行列表理解-无论如何它都会返回一个列表。)
For your request to concatenate all files only from row 3 on - this should be fairly simple. 对于仅连接第3行起的所有文件的请求-这应该非常简单。 Just add a
.loc[3:]
to your read_csv
command: 只需在您的
read_csv
命令中添加.loc[3:]
:
combined_csv = pd.concat([pd.read_csv(f).loc[3:] for f in all_filenames ])
But if you want to make more modifications, you should use a normal for loop, to get all things done in a readable way and concat everything afterwards, like: 但是,如果要进行更多修改,则应使用常规的for循环,以可读的方式完成所有操作并随后合并所有内容,例如:
mydfs = []
for f in all_filenames:
df = pd.read_csv(f).loc[3:]
df['filename'] = f
mydfs.append(df)
combined_csv = pd.concat(mydfs)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.