简体   繁体   English

如何使用python pandas连接csv文件组时删除重复的标题(多行)

[英]How to remove the repeated header(multiple rows) while concatenating group of csv files using python pandas

I'm concatenating a group (6-10 files) of .csv files into one .csv file using python pandas library. 我正在使用python pandas库将一组(6-10个文件)的.csv文件连接成一个.csv文件。 I want to remove header (rows) contains 7 lines for all csv files except first file. 我想删除除第一个文件以外的所有csv文件的标题(行)包含7行。 How do I do this? 我该怎么做呢?

import glob 
import pandas as pd 
#filenames = glob.glob(path + "/*.csv") 
filenames = glob.glob("*.csv") 
print(filenames) 
count_files = 0 dfs = [] 
for filename in filenames: 
  if count_files ==0: 
    dfs.append(pd.read_csv(filename)) 
    full_df =pd.concat(dfs) count_files += 1 
  else: 
    dfs.append(pd.read_csv(filename, sep=";", skiprows=[0]))    #dfs.append(pd.read_csv(filename)) 
  full_df =pd.concat(dfs) 
  count_files +=1 
full_df.to_csv( "combined_csv.csv",header = None, index=False, encoding='utf-8-sig')
#creating dummy csv's for your requirement.
## appending muliple csvs in to one single csv 

df=pd.DataFrame({'A':[1,1,1],
                 'B':[1,2,3],
                'C':[3,9,3],
                'D':[1,8,9]})

df1=pd.DataFrame({'A':[4,5,5],
                 'B':[1,1,2],
                'C':[2,2,8],
                'D':[6,4,3]})

df2=pd.DataFrame({'A':[9,1,1],
                 'B':[9,2,3],
                'C':[3,9,13],
                'D':[9,8,9]})

df3=pd.DataFrame({'A':[14,15,5],
                 'B':[1,11,2],
                'C':[12,12,8],
                'D':[6,4,3]})

df.to_csv("one.csv")
df1.to_csv("two.csv")
df2.to_csv("three.csv")
df3.to_csv("four.csv")

import os
csv_list = []
for root, dirs,files in os.walk(os.getcwd(), topdown=True):
    for name in files:
        csv_list.append(os.path.join(root, name))

print(csv_list)

['/home/vikas.rana/stck_flw/two.csv',
 '/home/vikas.rana/stck_flw/one.csv',
 '/home/vikas.rana/stck_flw/four.csv',
 '/home/vikas.rana/stck_flw/three.csv']

names = ['A','B','C','D']
combined_csv = pd.concat([pd.read_csv(f, header=None,skiprows=[0],names = names) for f in csv_list ],ignore_index=True)



print(combined_csv)
# output
        A   B   C   D
    0   4   1   2   6
    1   5   1   2   4
    2   5   2   8   3
    3   1   1   3   1
    4   1   2   9   8
    5   1   3   3   9
    6   14  1   12  6
    7   15  11  12  4
    8   5   2   8   3
    9   9   9   3   9
    10  1   2   9   8
    11  1   3   13  9

As everyone say, provinding some code would help to clarify your intention. 正如大家所说,提出一些代码将有助于澄清你的意图。

However, this could solve your problem. 但是,这可以解决您的问题。 It consists on creating an auxiliary CSV file from the rest and then importing it to store it as a Pandas DataFrame (in case you need it). 它包括从其余部分创建辅助CSV文件,然后导入它以将其存储为Pandas DataFrame(如果您需要它)。

Let's imagine FileName1.csv to have the following content: 让我们假设FileName1.csv具有以下内容:

ColumnName_1,ColumnName_2,ColumnName_3
data11,data12,data13
data21,data22,data33

And FileName2.csv to have the following content: 并且FileName2.csv具有以下内容:

ColumnName_1,ColumnName_2,ColumnName_3
Row to be deleted
Row to be deleted
Row to be deleted
data2_11,data2_12,data2_13
data2_21,data2_22,data2_33

And let's suppose that you would like to retain the headers in file 1 and skip the first 4 rows of the second one. 我们假设你想保留文件1中的标题并跳过第二行的前4行。

import pandas as pd

# Define a function that gets the file content ignoring n first rows
def get_content(file_path,ignored_rows):
    f = open(file_path,'r')
    file_data = f.readlines()
    for line in file_data[ignored_rows:]:
        files_content.append(line.rstrip('\n'))

# Generate empty List to allocate files rows
files_content = []

# Read first file
get_content('Files/FileName1.csv',0)

# Read second file
get_content('Files/FileName2.csv',4)

# Generate Complete CSV File
with open('Files/FullData.csv','w') as f:
    for line in files_content:
        f.write(line+'\n')

df = pd.read_csv('Files/FullData.csv')

This is ready to read a small amount of files. 这已准备好读取少量文件。 If you need to read a number of files, you add another loop to apply the same code. 如果需要读取大量文件,则添加另一个循环以应用相同的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM