[英]python Script to read three csv files and writing in one csv file
我正在嘗試讀取三個csv文件,並希望通過將第一列作為ID來將輸出放在單個csv文件中,因此它不應重復,因為它在所有輸入csv文件中都很常見。 我已經寫了一些代碼,但是它給出了錯誤。 我不確定這是執行任務的最佳方法。
碼:
#! /usr/bin/python
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = ("ID")
for csvfile in ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv"):
with open(csvfile, 'rb') as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key)
result[id][key] = row[key]
with open("out.csv", "w") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item]
輸入的csv文件如下所示:
FR1.1.csv
>
TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED
FR2.0.csv
>
TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR2.0 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR2.0 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR2.0 , COMPILE_FAILED , EXECUTION_FAILED
FR2.5.csv
>
TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR2.5 , COMPILE_FAILED , EXECUTION_FAILED
out.csv
(必填)->
TEST_Id , RELEASE , COMPILE_STATUS , EXECUTION_STATUS , RELEASE , COMPILE_STATUS , EXECUTION_STATUS , RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
多虧了發布上述結果的最佳方法。
如果您只想根據ID 連接每個CSV行,則不要使用DictReader
。 字典鍵必須是唯一的,但是您要生成具有多個EXECUTION_STATUS
和RELEASE
等列的行。
此外,如果一個或兩個輸入CSV文件沒有輸入,您將如何處理ID?
使用常規的讀取器並存儲以文件名為關鍵字的每一行。 也將fieldnames
設為列表:
import csv
from collections import defaultdict
result = defaultdict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
result[row[0]][csvfile] = row[1:] # all but the first column
with open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
for id_ in sorted(result):
row = [id_]
data = result[id_]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
該代碼按文件名存儲行,以便以后可以從每個文件構建整行,但是如果該文件中缺少該行,則仍可以填入空白。
另一種方法是在每個列名稱后附加一個數字或文件名,從而使列名稱唯一。 這樣,您的DictReader
方法也可以工作。
上面給出:
TEST_ID, RELEASE , COMPILE_STATUS , EXECUTION_STATUS, RELEASE , COMPILE_STATUS , EXECUTION_STATUS, RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED, FR2.0 , COMPILE_FAILED , EXECUTION_FAILED, FR2.5 , COMPILE_FAILED , EXECUTION_FAILED
如果您需要基於一個輸入文件來訂購,則從第一個閱讀循環中忽略該輸入文件; 而是在寫輸出循環時讀取該文件,並使用其第一列查找其他文件數據:
import csv
from collections import defaultdict
result = defaultdict(dict)
filenames = ("FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = []
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
result[row[0]][csvfile] = row[1:] # all but the first column
with open("FR1.1.csv", "rb") as infile, open("out.csv", "wb") as outfile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
writer = csv.writer(outfile)
writer.writerow(headers + fieldnames)
for row in sorted(reader):
data = result[row[0]]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
這確實意味着將忽略其他兩個文件中多余的任何TEST_ID
值。
如果您想保留所有TEST_ID
那么我將使用collections.OrderedDict()
; 在以后的文件中找到的新TEST_ID
將被附加到末尾:
import csv
from collections import OrderedDict
result = OrderedDict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
if row[0] not in result:
result[row[0]] = {}
result[row[0]][csvfile] = row[1:] # all but the first column
with open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
for id_ in result:
row = [id_]
data = result[id_]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
OrderedDict
按插入順序維護條目; 因此FR1.1.csv
設置了所有密鑰的順序,但是在第一個文件中找不到的任何FR2.0.csv
ID都將附加到字典的末尾,依此類推。
對於2.7以下的Python版本,請安裝反向端口( 有關python的較早版本,請參見OrderedDict ),或使用以下命令手動跟蹤ID順序:
import csv
from collections import defaultdict
result = defaultdict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
ids, seen = [], set()
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
id_ = row[0]
# track ordering
if id_ not in seen:
seen.add(id_)
ids.append(id_)
result[id_][csvfile] = row[1:] # all but the first column
with open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
for id_ in ids:
row = [id_]
data = result[id_]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.