[英]In Python, run each row in a csv through tests and output a new csv showing which test each row failed
在 python 中,我想通過測試用例運行 csv 來檢查數據異常,同時跟蹤它失敗的每個測試。
This is my first big project in Python, though I have Python experience and can do basic one-liners using pandas like df.drop_duplicates(subset=['UniqueID'])
, I just am not sure what the right direction would be.
MnLast | 錳拳 | 錳死? | 錳無效? | 上一頁 | SpFirst | SPInactive? | 死機? | 添加者 | 薩爾 |
---|---|---|---|---|---|---|---|---|---|
能源部 | 約翰 | 不 | 不 | 能源部 | 簡 | 不 | 不 | 約翰·多伊先生 | 約翰先生 |
Main(Mn) 記錄不為空,Spouse(Sp) 記錄不為空,兩條記錄都沒有標記為已故,但Addee或Sal沒有“&”或“and”。 這表明收件人 (Addee) 或稱呼 (Sal) 不正確,因為收件人或稱呼應該有以下變體:“ Mr. and Mrs. John doe ”
Read csv
for each row in csv
#test case 1
if [ {( (MNFirst AND MnLast) =! BLANK ) AND ( (SpLast AND SpFirst) =! BLANK )} AND
(( SpDead? AND MnDead?) =! Yes)] AND [(Addee OR Sal) Does not contains ('&' or 'and')]
output failing row to new csv tracking what case it failed
else
nothing
讀取 csv 文件,通過幾個測試用例(有幾個)運行該文件。 然后是 output 一個新的 csv,新列指示每個案例失敗。 因此,如果我的數據示例失敗了 3 個不同的案例,新列將顯示一個與失敗案例相對應的數字。 csv output 將顯示以下內容:
案例失敗 | MnLast | 錳拳 | 錳死? | 錳無效? | 上一頁 | SpFirst | SPInactive? | 死機? | 添加者 | 薩爾 |
---|---|---|---|---|---|---|---|---|---|---|
1、5、8 | 能源部 | 約翰 | 不 | 不 | 能源部 | 簡 | 不 | 不 | 約翰·多伊先生 | 約翰先生 |
任何幫助我指出正確方向的幫助將不勝感激。
import pandas as pd
import numpy as np
data = pd.read_csv(csv_file, encoding='latin-1')
# Create array to track failed cases.
data['Failed Test']= ''
data = data.replace(np.nan,'')
data.insert(0, 'ID', range(0, len(data)))
# Test 1: The spouse shows a deceased date, but martial status is not marked as widowed
testcase1 = data[((data['SRDeceasedDate'] != '') & (data['MrtlStat'] != 'Widowed'))]
ids = testcase1.index.tolist()
for i in ids:
data.at[i,'Failed Test']+=', 1'
# Test 2: Spouse name information is filled in but marital status shows single.
df = data[((data['SRLastName'] != '') | (data['SRFirstName'] != ''))]
testcase2 = df[df['MrtlStat'] == 'single']
ids = testcase2.index.tolist()
for i in ids:
data.at[i,'Failed Test']+=', 2'
# sort and separate which rows have failed a test
failed = data[(data['Failed test'] != '')]
passed = data[(data['Failed test'] == '')]
failed['Failed Test'] = failed['Failed Test'].str[1:]
failed = failed[(failed['Failed Test'] != '')]
# Clean up
del failed["ID"]
del passed["ID"]
# Print results
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There were" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")
# output failed rows
failed.to_csv("C:/Users/Output/failed.csv", index=False,)
# output passed rows
passed.to_csv("C:/Users/Output/passed.csv", index=False,)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.