[英]In Python, run each row in a csv through tests and output a new csv showing which test each row failed
In python, I would like to run a csv through test cases checking for data anomalies while keep track of each test it fails.在 python 中,我想通过测试用例运行 csv 来检查数据异常,同时跟踪它失败的每个测试。
This is my first big project in Python, though I have Python experience and can do basic one-liners using pandas like df.drop_duplicates(subset=['UniqueID'])
, I just am not sure what the right direction would be. This is my first big project in Python, though I have Python experience and can do basic one-liners using pandas like
df.drop_duplicates(subset=['UniqueID'])
, I just am not sure what the right direction would be.
MnLast ![]() |
MnFist![]() |
MnDead?![]() |
MnInactive?![]() |
SpLast![]() |
SpFirst ![]() |
SPInactive? ![]() |
SpDead?![]() |
Addee![]() |
Sal![]() |
---|---|---|---|---|---|---|---|---|---|
Doe![]() |
John![]() |
No![]() |
No![]() |
Doe![]() |
Jane![]() |
No![]() |
No![]() |
Mr. John Doe![]() |
Mr. John![]() |
Main(Mn) record isn't blank, Spouse(Sp) record isn't blank, neither record is marked deceased but Addee or Sal doesn't have '&' or 'and'.
Main(Mn) 记录不为空,Spouse(Sp) 记录不为空,两条记录都没有标记为已故,但Addee或Sal没有“&”或“and”。 This indicates the Addressee(Addee) or Salutation(Sal) is incorrect, as Addressee or Salutation should have a variation of: " Mr. and Mrs. John doe "
这表明收件人 (Addee) 或称呼 (Sal) 不正确,因为收件人或称呼应该有以下变体:“ Mr. and Mrs. John doe ”
Read csv
for each row in csv
#test case 1
if [ {( (MNFirst AND MnLast) =! BLANK ) AND ( (SpLast AND SpFirst) =! BLANK )} AND
(( SpDead? AND MnDead?) =! Yes)] AND [(Addee OR Sal) Does not contains ('&' or 'and')]
output failing row to new csv tracking what case it failed
else
nothing
Read a csv file, run the file through several test cases(there are several).读取 csv 文件,通过几个测试用例(有几个)运行该文件。 Then output a new csv, with a new column indicating each case it failed.
然后是 output 一个新的 csv,新列指示每个案例失败。 So if my Data Example failed 3 different cases, the new column would show a number corresponding to the case it failed.
因此,如果我的数据示例失败了 3 个不同的案例,新列将显示一个与失败案例相对应的数字。 The csv output would show the following:
csv output 将显示以下内容:
CaseFailed![]() |
MnLast ![]() |
MnFist![]() |
MnDead?![]() |
MnInactive?![]() |
SpLast![]() |
SpFirst ![]() |
SPInactive? ![]() |
SpDead?![]() |
Addee![]() |
Sal![]() |
---|---|---|---|---|---|---|---|---|---|---|
1, 5, 8 ![]() |
Doe![]() |
john![]() |
No![]() |
No![]() |
Doe![]() |
Jane![]() |
No![]() |
No![]() |
Mr. John Doe![]() |
Mr. John![]() |
Any help to point me in the right direction would be greatly appreciated.任何帮助我指出正确方向的帮助将不胜感激。
import pandas as pd
import numpy as np
data = pd.read_csv(csv_file, encoding='latin-1')
# Create array to track failed cases.
data['Failed Test']= ''
data = data.replace(np.nan,'')
data.insert(0, 'ID', range(0, len(data)))
# Test 1: The spouse shows a deceased date, but martial status is not marked as widowed
testcase1 = data[((data['SRDeceasedDate'] != '') & (data['MrtlStat'] != 'Widowed'))]
ids = testcase1.index.tolist()
for i in ids:
data.at[i,'Failed Test']+=', 1'
# Test 2: Spouse name information is filled in but marital status shows single.
df = data[((data['SRLastName'] != '') | (data['SRFirstName'] != ''))]
testcase2 = df[df['MrtlStat'] == 'single']
ids = testcase2.index.tolist()
for i in ids:
data.at[i,'Failed Test']+=', 2'
# sort and separate which rows have failed a test
failed = data[(data['Failed test'] != '')]
passed = data[(data['Failed test'] == '')]
failed['Failed Test'] = failed['Failed Test'].str[1:]
failed = failed[(failed['Failed Test'] != '')]
# Clean up
del failed["ID"]
del passed["ID"]
# Print results
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There were" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")
# output failed rows
failed.to_csv("C:/Users/Output/failed.csv", index=False,)
# output passed rows
passed.to_csv("C:/Users/Output/passed.csv", index=False,)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.