简体   繁体   English

Python 合并 CSV,删除 header 并删除空白

[英]Python combine CSVs, remove header and remove blanks

I'm extremely new to Python & trying to figure the below out:我对 Python 非常陌生,并试图弄清楚以下内容:

I have multiple CSV files (monthly files) that I'm trying to combine into a yearly file.我有多个CSV 文件(月度文件),我试图将它们组合成一个年度文件。 The monthly files all have headers, so I'm trying to keep the first header & remove the rest.每月文件都有标题,所以我试图保留第一个 header 并删除 rest。 I used the below script which accomplished this, however there are 10 blank rows between each month.我使用了以下脚本来完成此操作,但是每个月之间有10 个空白行

Does anyone know what I can add to this to remove the blank rows?有谁知道我可以添加什么来删除空白行?

import shutil
import glob


#import csv files from folder
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()  # glob lacks reliable ordering, so impose your own if output order matters
with open('someoutputfile.csv', 'wb') as outfile:
    for i, fname in enumerate(allFiles):
        with open(fname, 'rb') as infile:
            if i != 0:
                infile.readline()  # Throw away header on all but first file
            # Block copy rest of file from input to output without parsing
            shutil.copyfileobj(infile, outfile)
            print(fname + " has been imported.")     

Thank you in advance!先感谢您!

assuming the dataset isn't bigger than you memory, I suggest reading each file in pandas, concatenating the dataframes and filtering from there.假设数据集不比你 memory 大,我建议阅读 pandas 中的每个文件,连接数据帧并从那里过滤。 blank rows will probably show up as nan.空白行可能会显示为 nan。

import pandas as pd
import glob
path = r'data/US/market/merged_data'
allFiles = glob.glob(path + "/*.csv")
allFiles.sort()
df = pd.Dataframe()
for i, fname in enumerate(allFiles):
    #append data to existing dataframe
    df = df.append(pd.read(fname), ignore_index = True)
#hopefully, this will drop blank rows
df = df.dropna(how = 'all')
#write to file
df.to_csv('someoutputfile.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM