简体   繁体   English

在 Python 中,通过测试运行 csv 中的每一行,并在 output 中运行一个新的 Z628CB5675FF524F3EZE719

[英]In Python, run each row in a csv through tests and output a new csv showing which test each row failed

In python, I would like to run a csv through test cases checking for data anomalies while keep track of each test it fails.在 python 中,我想通过测试用例运行 csv 来检查数据异常,同时跟踪它失败的每个测试。

This is my first big project in Python, though I have Python experience and can do basic one-liners using pandas like df.drop_duplicates(subset=['UniqueID']) , I just am not sure what the right direction would be. This is my first big project in Python, though I have Python experience and can do basic one-liners using pandas like df.drop_duplicates(subset=['UniqueID']) , I just am not sure what the right direction would be.

  • Data Example:数据示例:
MnLast MnLast MnFist锰拳 MnDead?锰死? MnInactive?锰无效? SpLast上一页 SpFirst SpFirst SPInactive? SPInactive? SpDead?死机? Addee添加者 Sal萨尔
Doe能源部 John约翰 No No Doe能源部 Jane No No Mr. John Doe约翰·多伊先生 Mr. John约翰先生

Main(Mn) record isn't blank, Spouse(Sp) record isn't blank, neither record is marked deceased but Addee or Sal doesn't have '&' or 'and'. Main(Mn) 记录不为空,Spouse(Sp) 记录不为空,两条记录都没有标记为已故,但AddeeSal没有“&”或“and”。 This indicates the Addressee(Addee) or Salutation(Sal) is incorrect, as Addressee or Salutation should have a variation of: " Mr. and Mrs. John doe "这表明收件人 (Addee) 或称呼 (Sal) 不正确,因为收件人或称呼应该有以下变体:“ Mr. and Mrs. John doe

  • Pseudo code:伪代码:
Read csv

for each row in csv

  #test case 1
  if [ {( (MNFirst AND MnLast) =! BLANK ) AND ( (SpLast AND SpFirst) =! BLANK )} AND 
  (( SpDead? AND MnDead?) =! Yes)] AND [(Addee OR Sal) Does not contains ('&' or 'and')] 
  
     output failing row to new csv tracking what case it failed

  else 

      nothing
  • My goal我的目标

Read a csv file, run the file through several test cases(there are several).读取 csv 文件,通过几个测试用例(有几个)运行该文件。 Then output a new csv, with a new column indicating each case it failed.然后是 output 一个新的 csv,新列指示每个案例失败。 So if my Data Example failed 3 different cases, the new column would show a number corresponding to the case it failed.因此,如果我的数据示例失败了 3 个不同的案例,新列将显示一个与失败案例相对应的数字。 The csv output would show the following: csv output 将显示以下内容:

CaseFailed案例失败 MnLast MnLast MnFist锰拳 MnDead?锰死? MnInactive?锰无效? SpLast上一页 SpFirst SpFirst SPInactive? SPInactive? SpDead?死机? Addee添加者 Sal萨尔
1, 5, 8 1、5、8 Doe能源部 john约翰 No No Doe能源部 Jane No No Mr. John Doe约翰·多伊先生 Mr. John约翰先生

Any help to point me in the right direction would be greatly appreciated.任何帮助我指出正确方向的帮助将不胜感激。

import pandas as pd 
import numpy as np

data = pd.read_csv(csv_file, encoding='latin-1')

# Create array to track failed cases.
data['Failed Test']= ''
    data = data.replace(np.nan,'')
    data.insert(0, 'ID', range(0, len(data)))

# Test 1: The spouse shows a deceased date, but martial status is not marked as widowed
  testcase1 = data[((data['SRDeceasedDate'] != '') & (data['MrtlStat'] != 'Widowed'))]
    ids = testcase1.index.tolist()
    for i in ids:
      data.at[i,'Failed Test']+=', 1'

# Test 2: Spouse name information is filled in but marital status shows single. 
 df = data[((data['SRLastName'] != '') | (data['SRFirstName'] != ''))]
    testcase2 = df[df['MrtlStat'] == 'single']
    ids = testcase2.index.tolist()
    for i in ids:
      data.at[i,'Failed Test']+=', 2'

# sort and separate  which rows have failed a test
failed = data[(data['Failed test'] != '')]
passed = data[(data['Failed test'] == '')]
failed['Failed Test'] = failed['Failed Test'].str[1:]
failed = failed[(failed['Failed Test'] != '')]

# Clean up
del failed["ID"]
del passed["ID"]

# Print results 
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There were" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")

# output failed rows
failed.to_csv("C:/Users/Output/failed.csv", index=False,) 

# output passed rows
passed.to_csv("C:/Users/Output/passed.csv", index=False,) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM