簡體   English   中英

在 Python 中,通過測試運行 csv 中的每一行,並在 output 中運行一個新的 Z628CB5675FF524F3EZE719

[英]In Python, run each row in a csv through tests and output a new csv showing which test each row failed

在 python 中,我想通過測試用例運行 csv 來檢查數據異常,同時跟蹤它失敗的每個測試。

This is my first big project in Python, though I have Python experience and can do basic one-liners using pandas like df.drop_duplicates(subset=['UniqueID']) , I just am not sure what the right direction would be.

  • 數據示例:
MnLast 錳拳 錳死? 錳無效? 上一頁 SpFirst SPInactive? 死機? 添加者 薩爾
能源部 約翰 能源部 約翰·多伊先生 約翰先生

Main(Mn) 記錄不為空,Spouse(Sp) 記錄不為空,兩條記錄都沒有標記為已故,但AddeeSal沒有“&”或“and”。 這表明收件人 (Addee) 或稱呼 (Sal) 不正確,因為收件人或稱呼應該有以下變體:“ Mr. and Mrs. John doe

  • 偽代碼:
Read csv

for each row in csv

  #test case 1
  if [ {( (MNFirst AND MnLast) =! BLANK ) AND ( (SpLast AND SpFirst) =! BLANK )} AND 
  (( SpDead? AND MnDead?) =! Yes)] AND [(Addee OR Sal) Does not contains ('&' or 'and')] 
  
     output failing row to new csv tracking what case it failed

  else 

      nothing
  • 我的目標

讀取 csv 文件,通過幾個測試用例(有幾個)運行該文件。 然后是 output 一個新的 csv,新列指示每個案例失敗。 因此,如果我的數據示例失敗了 3 個不同的案例,新列將顯示一個與失敗案例相對應的數字。 csv output 將顯示以下內容:

案例失敗 MnLast 錳拳 錳死? 錳無效? 上一頁 SpFirst SPInactive? 死機? 添加者 薩爾
1、5、8 能源部 約翰 能源部 約翰·多伊先生 約翰先生

任何幫助我指出正確方向的幫助將不勝感激。

import pandas as pd 
import numpy as np

data = pd.read_csv(csv_file, encoding='latin-1')

# Create array to track failed cases.
data['Failed Test']= ''
    data = data.replace(np.nan,'')
    data.insert(0, 'ID', range(0, len(data)))

# Test 1: The spouse shows a deceased date, but martial status is not marked as widowed
  testcase1 = data[((data['SRDeceasedDate'] != '') & (data['MrtlStat'] != 'Widowed'))]
    ids = testcase1.index.tolist()
    for i in ids:
      data.at[i,'Failed Test']+=', 1'

# Test 2: Spouse name information is filled in but marital status shows single. 
 df = data[((data['SRLastName'] != '') | (data['SRFirstName'] != ''))]
    testcase2 = df[df['MrtlStat'] == 'single']
    ids = testcase2.index.tolist()
    for i in ids:
      data.at[i,'Failed Test']+=', 2'

# sort and separate  which rows have failed a test
failed = data[(data['Failed test'] != '')]
passed = data[(data['Failed test'] == '')]
failed['Failed Test'] = failed['Failed Test'].str[1:]
failed = failed[(failed['Failed Test'] != '')]

# Clean up
del failed["ID"]
del passed["ID"]

# Print results 
failed['Test Case Failed'].value_counts()
print("There was a total of",data.shape[0], "rows.", "There were" ,data.shape[0] - failed.shape[0], "rows passed and" ,failed.shape[0], "rows failed at least one test case")

# output failed rows
failed.to_csv("C:/Users/Output/failed.csv", index=False,) 

# output passed rows
passed.to_csv("C:/Users/Output/passed.csv", index=False,) 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM