简体   繁体   English

将多个 CSV 文件中的列数据合并到一个 CSV 文件中

[英]Consolidating column data from number of CSV files into a single CSV file

I am new to Python, especially data handling.我是 Python 新手,尤其是数据处理。 This is what I am trying to achieve-这就是我正在努力实现的目标-

I run CIS test on several servers and produce a CSV file for each server (file name is the same as the server name).我在多台服务器上运行 CIS 测试并为每个服务器生成一个 CSV 文件(文件名与服务器名相同)。 The output file from all servers is copied to a central server The output produced looks like below (Truncated output)-将所有服务器的输出文件复制到中央服务器产生的输出如下所示(截断的输出)-

File1: dc1pp1v01.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Not Found,PASSED

File2: dc1pp2v01.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Not Found,PASSED

File..n: dc1pp1v02.co.uk.csv
Description,Outcome,Result
1.1 Database Placement,/var/lib/mysql,PASSED
1.2 Use dedicated least privilaged account,mysql,PASSED
1.3 Diable MySQL history,Found,FAILED

What I want is that output should look like-我想要的是输出应该看起来像-

Description  dc1pp1v01 dc1pp2v01 dc1pp1v02 
0  1.1 Database Placement PASSED   PASSED   PASSED
1  1.2 Use dedicated least privilaged account PASSED   PASSED   PASSED
2  1.3 Diable MySQL history PASSED   PASSED   FAILED

To merge these files, I have created another file with only Description field in it and two-column heading as below-为了合并这些文件,我创建了另一个文件,其中只有描述字段和两列标题,如下所示 -

file: cis_report.csv
Description,Result
1.1 Database Placement,
1.2 Use dedicated least privilaged account,
1.3 Diable MySQL history,

I have written below code to do column-based merge-我写了下面的代码来进行基于列的合并-

import glob
import os
import pandas as pd 

col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))

cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")

for fl in all_files:
   d = pd.read_csv(fl, usecols=col_list)
   f = cis_df.merge(d, on='Description')
   cis_df = f.copy()
   
print(cis_df.head())

The output I am getting is-我得到的输出是-

Description Result_x Result_y Result_x Result_y
0                      1.1 Database Placement      NaN   PASSED   PASSED   PASSED
1  1.2 Use dedicated least privilaged account      NaN   PASSED   PASSED   PASSED
2                    1.3 Diable MySQL history      NaN   PASSED   PASSED   FAILED

In my code, I am not sure how I get the file name as a header for the result and get rid of NaN.在我的代码中,我不确定如何将文件名作为结果的标题并摆脱 NaN。

Also, is there a better way of achieving the output I am looking for without using dummy file(cis_report.csv)?另外,是否有更好的方法来实现我正在寻找的输出而不使用虚拟文件(cis_report.csv)? Your help is much appreciated.非常感谢您的帮助。

You need the DataFrme.pivot() function.您需要DataFrme.pivot()函数。 The code below is well commented and a fully working example.下面的代码有很好的注释和一个完整的工作示例。 Make changes as you need根据需要进行更改

import os
import pandas as pd

# Get all file names in a directory
# Use . to use current working directory or replace it with
# e.g. r'C:\Users\Dames\Desktop\csv_files'
file_names = os.listdir('.')

# Filter out all non .csv files
# You can skip this if you know that only .csv files will be in that folder
csv_file_names = [fn for fn in file_names if fn[-4:] == '.csv']

# This Loads a csv file into a dataframe and sets the Server column
def load_csv(file_name):
    df = pd.read_csv(file_name)
    df['Server'] = file_name.split('.')[0]
    return df

# Append all the csvfiles after being processed by load_csv
df = pd.DataFrame().append([load_csv(fn) for fn in csv_file_names])

# Turn DataFrame into Pivot Table
df = df.pivot('Description', 'Server', 'Result')

# Save DataFrame into CSV File
# If this script runs multiple times make sure that the final.csv is saved elsewhere!
# Or it will be read by the code above as an input file
df.to_csv('final.csv')

The final DataFrame looks like this最终的 DataFrame 看起来像这样

Server                                     dc1pp1v01 dc1pp1v02 dc1pp2v01
Description
1.1 Database Placement                        PASSED    PASSED    PASSED
1.2 Use dedicated least privilaged account    PASSED    PASSED    PASSED
1.3 Diable MySQL history                      PASSED    FAILED    PASSED

And the CSV file like this和这样的 CSV 文件

Description,dc1pp1v01,dc1pp1v02,dc1pp2v01
1.1 Database Placement,PASSED,PASSED,PASSED
1.2 Use dedicated least privilaged account,PASSED,PASSED,PASSED
1.3 Diable MySQL history,PASSED,FAILED,PASSED

Use -用 -

import glob
import os
import pandas as pd 

col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))

cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")
from functools import reduce
df_final = reduce(lambda left,right: pd.merge(left,right,on='Description'), [cis_df]+[pd.read_csv(i, usecols=col_list) for i in all_files])
df_final.drop([i for i in df_final.columns if 'Outcome' in i], axis=1).rename(columns={i:j for i,j in zip([i for i in df_final.columns if 'Result' in i], [i.replace('.co.uk.csv','') for i in all_files])})

Output输出

    Description dc1pp1v01   dc1pp2v01   dc1pp1v02
0   1.1 Database Placement  PASSED  PASSED  PASSED
1   1.2 Use dedicated least privilaged account  PASSED  PASSED  PASSED
2   1.3 Diable MySQL history    PASSED  PASSED  FAILED

Finally, I managed to do it on my own.最后,我设法自己做到了。 Below solution works for me but I am sure there are more concise way of doing it-下面的解决方案对我有用,但我相信有更简洁的方法 -

import glob
import os
import pandas as pd 
from functools import reduce

col_list = ["Description","Result"]
path = "/Users/Python/Data"
all_files = glob.glob(os.path.join(path, "dc*.csv"))

final_cols = ['Description']
for j in all_files:
    final_cols.append(os.path.basename(j).split('.',1)[0]) 

cis_df = pd.read_csv("/Users/Python/Data/cis_report.csv")

df_final = reduce(lambda left,right: pd.merge(left,right,on='Description'), [cis_df]+[pd.read_csv(i, usecols=col_list) for i in all_files])
df_final.rename(columns=dict(zip(df_final.columns,final_cols)),inplace=True)

print(df_final.head())

I made a small change in the description holding file.我对描述保存文件做了一个小改动。 Removed result field and the ',' at the endo each line删除了每行末尾的结果字段和“,”

file: cis_report.csv文件:cis_report.csv

Description
1.1 Database Placement
1.2 Use dedicated least privilaged account
1.3 Diable MySQL history

The output I get is-我得到的输出是-

Description dc1pp1v01 dc1pp2v01 dc2pp1v01
0                      1.1 Database Placement        PASSED        PASSED        PASSED
1  1.2 Use dedicated least privilaged account        PASSED        PASSED        PASSED
2                    1.3 Diable MySQL history        PASSED        PASSED        FAILED

You already have a winner, nevertheless:你已经有一个赢家,但是:

import csv
from pathlib import Path

path = Path('/Users/Python/Data')

# Read the reports and store the results in a 2-dim list
results = []
for file in path.glob('dc*.co.uk.csv'):
    with open(file, 'r') as fin:
        results += [[file.name.split('.')[0]]
                    + [row[2] for row in csv.reader(fin)][1:]]

# Read the row labels
with open(path / 'cis_report.csv', 'r') as fin:
    labels = [row[0] for row in csv.reader(fin)]

# Prepare the output
output = [[label] + [result[i] for result in results]
          for i, label in enumerate(labels)]

# Write the output
with open(path / 'cis_reports_merged.csv', 'w') as fout:
    csv.writer(fout, delimiter='\t').writerows(output)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将来自不同文件夹的多个csv文件中的选定列合并到单个csv文件中 - Combine selected column from multiple csv files from different folders to a single csv file 将单列CSV文件合并为一个14列的CSV文件 - Merging CSV files with a single column into one CSV file with 14 columns 将一列 csv 文件合并为一个 csv 文件 - Merge one-column csv files into a single csv file 将数据写入 csv 文件中的单列 - Write data into csv file in single column 将可变数量的csv文件中的一列合并到一个csv文件中 - Merge one column from variable number of csv files into one csv file 将单列CSV数据转换为具有多列的新CSV文件 - Turning a single column of csv data into a new csv file with multiple columns 将多个输入CSV文件中的数据以列格式写入单个CSV中 - Write data from multiple input CSV file to a single CSV in column format 两个 CSV 文件,匹配一行中的一对与第二个 CSV 文件中的匹配值,在由相同类型的值组成的单个列中 - Two CSV files, match a pair from a row with matching values in 2nd CSV file, in a single column consisting of the same type of values Python 代码:从多个 csv 文件中提取单个列以另存为新的 csv 文件,而 column_header == source_csv_files - Python code: Extract single columns from multiple csv files to save as a new csv file while column_header == source_csv_files 从csv文件中的数据创建多个csv文件 - Creating multiple csv files from data within a csv file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM