![](/img/trans.png)
[英]Combine selected column from multiple csv files from different folders to a single csv file
[英]Consolidating column data from number of CSV files into a single CSV file
我是 Python 新手,尤其是數據處理。 這就是我正在努力實現的目標-
我在多台服務器上運行 CIS 測試並為每個服務器生成一個 CSV 文件(文件名與服務器名相同)。 將所有服務器的輸出文件復制到中央服務器產生的輸出如下所示(截斷的輸出)-
File1: dc1pp1v01.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Not Found,PASSED
File2: dc1pp2v01.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Not Found,PASSED
File..n: dc1pp1v02.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Found,FAILED
我想要的是輸出應該看起來像-
Description dc1pp1v01 dc1pp2v01 dc1pp1v02
0 1.1 Database Placement PASSED PASSED PASSED
1 1.2 Use dedicated least privilaged account PASSED PASSED PASSED
2 1.3 Diable MySQL history PASSED PASSED FAILED
為了合並這些文件,我創建了另一個文件,其中只有描述字段和兩列標題,如下所示 -
file: cis_report.csv
Description,Result
1.1 Database Placement,
1.2 Use dedicated least privilaged account,
1.3 Diable MySQL history,
我寫了下面的代碼來進行基於列的合並-
import glob
import os
import pandas as pd
col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))
cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")
for fl in all_files:
d = pd.read_csv(fl, usecols=col_list)
f = cis_df.merge(d, on='Description')
cis_df = f.copy()
print(cis_df.head())
我得到的輸出是-
Description Result_x Result_y Result_x Result_y
0 1.1 Database Placement NaN PASSED PASSED PASSED
1 1.2 Use dedicated least privilaged account NaN PASSED PASSED PASSED
2 1.3 Diable MySQL history NaN PASSED PASSED FAILED
在我的代碼中,我不確定如何將文件名作為結果的標題並擺脫 NaN。
另外,是否有更好的方法來實現我正在尋找的輸出而不使用虛擬文件(cis_report.csv)? 非常感謝您的幫助。
您需要DataFrme.pivot()
函數。 下面的代碼有很好的注釋和一個完整的工作示例。 根據需要進行更改
import os
import pandas as pd
# Get all file names in a directory
# Use . to use current working directory or replace it with
# e.g. r'C:\Users\Dames\Desktop\csv_files'
file_names = os.listdir('.')
# Filter out all non .csv files
# You can skip this if you know that only .csv files will be in that folder
csv_file_names = [fn for fn in file_names if fn[-4:] == '.csv']
# This Loads a csv file into a dataframe and sets the Server column
def load_csv(file_name):
df = pd.read_csv(file_name)
df['Server'] = file_name.split('.')[0]
return df
# Append all the csvfiles after being processed by load_csv
df = pd.DataFrame().append([load_csv(fn) for fn in csv_file_names])
# Turn DataFrame into Pivot Table
df = df.pivot('Description', 'Server', 'Result')
# Save DataFrame into CSV File
# If this script runs multiple times make sure that the final.csv is saved elsewhere!
# Or it will be read by the code above as an input file
df.to_csv('final.csv')
最終的 DataFrame 看起來像這樣
Server dc1pp1v01 dc1pp1v02 dc1pp2v01
Description
1.1 Database Placement PASSED PASSED PASSED
1.2 Use dedicated least privilaged account PASSED PASSED PASSED
1.3 Diable MySQL history PASSED FAILED PASSED
和這樣的 CSV 文件
Description,dc1pp1v01,dc1pp1v02,dc1pp2v01
1.1 Database Placement,PASSED,PASSED,PASSED
1.2 Use dedicated least privilaged account,PASSED,PASSED,PASSED
1.3 Diable MySQL history,PASSED,FAILED,PASSED
用 -
import glob
import os
import pandas as pd
col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))
cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")
from functools import reduce
df_final = reduce(lambda left,right: pd.merge(left,right,on='Description'), [cis_df]+[pd.read_csv(i, usecols=col_list) for i in all_files])
df_final.drop([i for i in df_final.columns if 'Outcome' in i], axis=1).rename(columns={i:j for i,j in zip([i for i in df_final.columns if 'Result' in i], [i.replace('.co.uk.csv','') for i in all_files])})
輸出
Description dc1pp1v01 dc1pp2v01 dc1pp1v02
0 1.1 Database Placement PASSED PASSED PASSED
1 1.2 Use dedicated least privilaged account PASSED PASSED PASSED
2 1.3 Diable MySQL history PASSED PASSED FAILED
最后,我設法自己做到了。 下面的解決方案對我有用,但我相信有更簡潔的方法 -
import glob
import os
import pandas as pd
from functools import reduce
col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))
final_cols = ['Description']
for j in all_files:
final_cols.append(os.path.basename(j).split('.',1)[0])
cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")
df_final = reduce(lambda left,right: pd.merge(left,right,on='Description'), [cis_df]+[pd.read_csv(i, usecols=col_list) for i in all_files])
df_final.rename(columns=dict(zip(df_final.columns,final_cols)),inplace=True)
print(df_final.head())
我對描述保存文件做了一個小改動。 刪除了每行末尾的結果字段和“,”
文件:cis_report.csv
Description
1.1 Database Placement
1.2 Use dedicated least privilaged account
1.3 Diable MySQL history
我得到的輸出是-
Description dc1pp1v01 dc1pp2v01 dc2pp1v01
0 1.1 Database Placement PASSED PASSED PASSED
1 1.2 Use dedicated least privilaged account PASSED PASSED PASSED
2 1.3 Diable MySQL history PASSED PASSED FAILED
你已經有一個贏家,但是:
import csv
from pathlib import Path
path = Path('/Users/Python/Data')
# Read the reports and store the results in a 2-dim list
results = []
for file in path.glob('dc*.co.uk.csv'):
with open(file, 'r') as fin:
results += [[file.name.split('.')[0]]
+ [row[2] for row in csv.reader(fin)][1:]]
# Read the row labels
with open(path / 'cis_report.csv', 'r') as fin:
labels = [row[0] for row in csv.reader(fin)]
# Prepare the output
output = [[label] + [result[i] for result in results]
for i, label in enumerate(labels)]
# Write the output
with open(path / 'cis_reports_merged.csv', 'w') as fout:
csv.writer(fout, delimiter='\t').writerows(output)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.