![](/img/trans.png)
[英]Concatenating multiple csv files into a single csv with the same header - Python
[英]How to remove the repeated header(multiple rows) while concatenating group of csv files using python pandas
我正在使用python pandas庫將一組(6-10個文件)的.csv
文件連接成一個.csv
文件。 我想刪除除第一個文件以外的所有csv
文件的標題(行)包含7行。 我該怎么做呢?
import glob
import pandas as pd
#filenames = glob.glob(path + "/*.csv")
filenames = glob.glob("*.csv")
print(filenames)
count_files = 0 dfs = []
for filename in filenames:
if count_files ==0:
dfs.append(pd.read_csv(filename))
full_df =pd.concat(dfs) count_files += 1
else:
dfs.append(pd.read_csv(filename, sep=";", skiprows=[0])) #dfs.append(pd.read_csv(filename))
full_df =pd.concat(dfs)
count_files +=1
full_df.to_csv( "combined_csv.csv",header = None, index=False, encoding='utf-8-sig')
#creating dummy csv's for your requirement.
## appending muliple csvs in to one single csv
df=pd.DataFrame({'A':[1,1,1],
'B':[1,2,3],
'C':[3,9,3],
'D':[1,8,9]})
df1=pd.DataFrame({'A':[4,5,5],
'B':[1,1,2],
'C':[2,2,8],
'D':[6,4,3]})
df2=pd.DataFrame({'A':[9,1,1],
'B':[9,2,3],
'C':[3,9,13],
'D':[9,8,9]})
df3=pd.DataFrame({'A':[14,15,5],
'B':[1,11,2],
'C':[12,12,8],
'D':[6,4,3]})
df.to_csv("one.csv")
df1.to_csv("two.csv")
df2.to_csv("three.csv")
df3.to_csv("four.csv")
import os
csv_list = []
for root, dirs,files in os.walk(os.getcwd(), topdown=True):
for name in files:
csv_list.append(os.path.join(root, name))
print(csv_list)
['/home/vikas.rana/stck_flw/two.csv',
'/home/vikas.rana/stck_flw/one.csv',
'/home/vikas.rana/stck_flw/four.csv',
'/home/vikas.rana/stck_flw/three.csv']
names = ['A','B','C','D']
combined_csv = pd.concat([pd.read_csv(f, header=None,skiprows=[0],names = names) for f in csv_list ],ignore_index=True)
print(combined_csv)
# output
A B C D
0 4 1 2 6
1 5 1 2 4
2 5 2 8 3
3 1 1 3 1
4 1 2 9 8
5 1 3 3 9
6 14 1 12 6
7 15 11 12 4
8 5 2 8 3
9 9 9 3 9
10 1 2 9 8
11 1 3 13 9
正如大家所說,提出一些代碼將有助於澄清你的意圖。
但是,這可以解決您的問題。 它包括從其余部分創建輔助CSV文件,然后導入它以將其存儲為Pandas DataFrame(如果您需要它)。
讓我們假設FileName1.csv具有以下內容:
ColumnName_1,ColumnName_2,ColumnName_3
data11,data12,data13
data21,data22,data33
並且FileName2.csv具有以下內容:
ColumnName_1,ColumnName_2,ColumnName_3
Row to be deleted
Row to be deleted
Row to be deleted
data2_11,data2_12,data2_13
data2_21,data2_22,data2_33
我們假設你想保留文件1中的標題並跳過第二行的前4行。
import pandas as pd
# Define a function that gets the file content ignoring n first rows
def get_content(file_path,ignored_rows):
f = open(file_path,'r')
file_data = f.readlines()
for line in file_data[ignored_rows:]:
files_content.append(line.rstrip('\n'))
# Generate empty List to allocate files rows
files_content = []
# Read first file
get_content('Files/FileName1.csv',0)
# Read second file
get_content('Files/FileName2.csv',4)
# Generate Complete CSV File
with open('Files/FullData.csv','w') as f:
for line in files_content:
f.write(line+'\n')
df = pd.read_csv('Files/FullData.csv')
這已准備好讀取少量文件。 如果需要讀取大量文件,則添加另一個循環以應用相同的代碼。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.