簡體   English   中英

python腳本讀取三個csv文件並寫入一個csv文件

[英]python Script to read three csv files and writing in one csv file

我正在嘗試讀取三個csv文件,並希望通過將第一列作為ID來將輸出放在單個csv文件中,因此它不應重復,因為它在所有輸入csv文件中都很常見。 我已經寫了一些代碼,但是它給出了錯誤。 我不確定這是執行任務的最佳方法。

碼:

#! /usr/bin/python
import csv
from collections import defaultdict

result = defaultdict(dict)
fieldnames = ("ID")

for csvfile in ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv"):
    with open(csvfile, 'rb') as infile:
        reader = csv.DictReader(infile)
        for row in reader:
            id = row.pop("ID")
            for key in row:
                fieldnames.add(key) 
                result[id][key] = row[key]

    with open("out.csv", "w") as outfile:
    writer = csv.DictWriter(outfile, sorted(fieldnames))
    writer.writeheader()
    for item in result:
        result[item]["ID"] = item
        writer.writerow(result[item]

輸入的csv文件如下所示:

FR1.1.csv >

TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED

FR2.0.csv >

TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR2.0 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR2.0 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR2.0 , COMPILE_FAILED , EXECUTION_FAILED

FR2.5.csv >

TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR2.5 , COMPILE_FAILED , EXECUTION_FAILED

out.csv (必填)->

TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS , RELEASE , COMPILE_STATUS , EXECUTION_STATUS , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED

多虧了發布上述結果的最佳方法。

如果您只想根據ID 連接每個CSV行,則不要使用DictReader 字典鍵必須是唯一的,但是您要生成具有多個EXECUTION_STATUSRELEASE等列的行。

此外,如果一個或兩個輸入CSV文件沒有輸入,您將如何處理ID?

使用常規的讀取器並存儲以文件名為關鍵字的每一行。 也將fieldnames設為列表:

import csv
from collections import defaultdict

result = defaultdict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]

for csvfile in filenames:
    with open(csvfile, 'rb') as infile:
        reader = csv.reader(infile)
        headers = next(reader, [])  # read first line, headers
        fieldnames.extend(headers[1:])  # all but the first column name
        lengths[csvfile] = len(headers) - 1  # keep track of how many items to backfill
        for row in reader:
            result[row[0]][csvfile] = row[1:]  # all but the first column

with open("out.csv", "wb") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(fieldnames)
    for id_ in sorted(result):
        row = [id_]
        data = result[id_]
        for filename in filenames:
            row.extend(data.get(filename) or [''] * lengths[filename])
        writer.writerow(row)

該代碼按文件名存儲行,以便以后可以從每個文件構建整行,但是如果該文件中缺少該行,則仍可以填入空白。

另一種方法是在每個列名稱后附加一個數字或文件名,從而使列名稱唯一。 這樣,您的DictReader方法也可以工作。

上面給出:

TEST_ID, RELEASE , COMPILE_STATUS , EXECUTION_STATUS, RELEASE , COMPILE_STATUS , EXECUTION_STATUS, RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED, FR2.0 , COMPILE_FAILED , EXECUTION_FAILED, FR2.5 , COMPILE_FAILED , EXECUTION_FAILED

如果您需要基於一個輸入文件來訂購,則從第一個閱讀循環中忽略該輸入文件; 而是在寫輸出循環時讀取該文件,並使用其第一列查找其他文件數據:

import csv
from collections import defaultdict

result = defaultdict(dict)
filenames = ("FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = []

for csvfile in filenames:
    with open(csvfile, 'rb') as infile:
        reader = csv.reader(infile)
        headers = next(reader, [])  # read first line, headers
        fieldnames.extend(headers[1:])  # all but the first column name
        lengths[csvfile] = len(headers) - 1  # keep track of how many items to backfill
        for row in reader:
            result[row[0]][csvfile] = row[1:]  # all but the first column

with open("FR1.1.csv", "rb") as infile, open("out.csv", "wb") as outfile:
    reader = csv.reader(infile)
    headers = next(reader, [])  # read first line, headers

    writer = csv.writer(outfile)
    writer.writerow(headers + fieldnames)

    for row in sorted(reader):
        data = result[row[0]]
        for filename in filenames:
            row.extend(data.get(filename) or [''] * lengths[filename])
        writer.writerow(row)

這確實意味着將忽略其他兩個文件中多余的任何TEST_ID值。

如果您想保留所有TEST_ID那么我將使用collections.OrderedDict() ; 在以后的文件中找到的新TEST_ID將被附加到末尾:

import csv
from collections import OrderedDict

result = OrderedDict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]

for csvfile in filenames:
    with open(csvfile, 'rb') as infile:
        reader = csv.reader(infile)
        headers = next(reader, [])  # read first line, headers
        fieldnames.extend(headers[1:])  # all but the first column name
        lengths[csvfile] = len(headers) - 1  # keep track of how many items to backfill
        for row in reader:
            if row[0] not in result:
                result[row[0]] = {}
            result[row[0]][csvfile] = row[1:]  # all but the first column

with open("out.csv", "wb") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(fieldnames)
    for id_ in result:
        row = [id_]
        data = result[id_]
        for filename in filenames:
            row.extend(data.get(filename) or [''] * lengths[filename])
        writer.writerow(row)

OrderedDict按插入順序維護條目; 因此FR1.1.csv設置了所有密鑰的順序,但是在第一個文件中找不到的任何FR2.0.csv ID都將附加到字典的末尾,依此類推。

對於2.7以下的Python版本,請安裝反向端口( 有關python的較早版本,請參見OrderedDict ),或使用以下命令手動跟蹤ID順序:

import csv
from collections import defaultdict

result = defaultdict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
ids, seen = [], set()

for csvfile in filenames:
    with open(csvfile, 'rb') as infile:
        reader = csv.reader(infile)
        headers = next(reader, [])  # read first line, headers
        fieldnames.extend(headers[1:])  # all but the first column name
        lengths[csvfile] = len(headers) - 1  # keep track of how many items to backfill
        for row in reader:
            id_ = row[0]
            # track ordering
            if id_ not in seen:
                seen.add(id_)
                ids.append(id_)
            result[id_][csvfile] = row[1:]  # all but the first column

with open("out.csv", "wb") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(fieldnames)
    for id_ in ids:
        row = [id_]
        data = result[id_]
        for filename in filenames:
            row.extend(data.get(filename) or [''] * lengths[filename])
        writer.writerow(row)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM