如何編寫一個在 Python 中的 20 個不同 csv 文件上運行該函數的函數？

Question

我有一個包含 19 個 csv 文件的目錄，每個文件都包含一個學生注冊號及其姓名列表。 有兩個名為 quiz1 和 quiz2 的獨立文件，它們都包含所有參加這些測驗的學生的信息，以及他們的姓名和獲得的總分。 在每一項中獲得的分數必須分為不同的欄，以及一個“noofpresent”欄，顯示他們參加那個特定測驗的出勤情況。

我的任務是解析所有這些文件並創建一個基本上如下所示的數據框。 上圖顯示了 19 個批次中的 5 個批次。

雖然我已經填寫了 Batch4 的相關字段，如圖所示，但我意識到對 18 個文件重復這個過程是瘋狂的。

我如何編寫一個程序或函數來為兩個測驗的所有剩余 18 個批次執行所有操作？ 我只需要了解如何處理剩余 18 個文件的自動化邏輯。

第 9 批的前任（例如）：

這是我需要為 19 個批次中的每一個復制的代碼：

import pandas as pd

spath = 'd:\\a2\\studentlist.csv'
q1path = 'd:\\a2\\quiz\\quiz1.csv'
q2path = 'd:\\a2\\quiz\\quiz2.csv'
b1path = 'd:\\a2\\batchwiselist\\1.csv'
b9path = 'd:\\a2\\batchwiselist\\9.csv'
tpath = 'd:\\a2\\testcasestudent.txt'

# the final dataframe that needs to be created and filled up eventually
idx = pd.MultiIndex.from_product([['batch1', 'batch2', 'batch3', 'batch4', 'batch9'], ['quiz1', 'quiz2']])
cols=['noofpresent', 'lesserthan50', 'between50and60', 'between60and70', 'between70and80', 'greaterthan80']
statdf = pd.DataFrame('-', idx, cols)


# ============BATCH 9===================]

# ----------- QUIZ 1 -----------]

# Master list of students in Batch 9
b9 = pd.read_csv(b9path, usecols=['studentName', 'admissionNumber'])
b9.rename(columns={'studentName' : 'Firstname'}, inplace=True)
# To match column from quiz1.csv to batch9.csv to for merger

# Master list of all who attended Quiz1
q1 = pd.read_csv(q1path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
q1.dropna(inplace=True)
q1['Grade/10.00'] = q1['Grade/10.00'] * 10
# Multiplying the grades by 10 to mark against 100 instead of 10

# Merge batch9 list of names to list of quiz1 on their firstname column
q1b9 = pd.merge(b9, q1)
q1b9 = q1.loc[q1['Firstname'].isin(b9.Firstname)]        # checking if the name exits in either lists
q1b9.reset_index(inplace=True)
#print(q1b9)

lt50 = q1b9.loc[(q1b9['Grade/10.00'] < 50)]         
#findout list of students whose grades are lesser than 50
out9q1 = (lt50['Grade/10.00'].count())
# print(out9q1) to just get the count of number of students who got <50 quiz1 from batch9

# Similar process for quiz2 below for batch9.
# -------------------- QUIZ 2 ------------------]

# Master list of all who attended Quiz2
q2 = pd.read_csv(q2path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
q2.dropna(inplace=True)
q2['Grade/10.00'] = q2['Grade/10.00'] * 10

# Merge B1 to Q2
q2b9 = pd.merge(b9, q2)
q2b9 = q2.loc[q2['Firstname'].isin(b9.Firstname)]
q2b9.reset_index(inplace=True)


q2b9.loc[(q2b9['Grade/10.00'] <= 50)].count()
lt50 = q2b9.loc[(q2b9['Grade/10.00'] < 50)]
out9q2 = (lt50['Grade/10.00'].count())
# print(out9q2)

上面的代碼計算所有在任一測驗中獲得少於 50 分的學生。 我對batch4做了類似的處理。 我需要復制這個，以便一個函數可以為所有可用的剩余（17-18）批次這樣做。

Answer 1

在下面的代碼中，我生成了所有 csv 路徑並逐一加載然后執行所有過程，然后將結果數據幀保存在數據幀列表中，例如 [[batch1_q1_result, batch1_q2_result], [batch2_q1_result, batch2_q2_result] ...]

def doAll(baseBatchPath, numberOfBatches):
    batchResultListAll = [] # this will store the resulted dataframes
    spath = 'd:\\a2\\studentlist.csv'
    q1path = 'd:\\a2\\quiz\\quiz1.csv'
    q2path = 'd:\\a2\\quiz\\quiz2.csv'
    tpath = 'd:\\a2\\testcasestudent.txt'
    # the final dataframe that needs to be created and filled up eventually
    idx = pd.MultiIndex.from_product([['batch1', 'batch2', 'batch3', 'batch4', 'batch9'], ['quiz1', 'quiz2']])
    cols=['noofpresent', 'lesserthan50', 'between50and60', 'between60and70', 'between70and80', 'greaterthan80']
    statdf = pd.DataFrame('-', idx, cols)

    # Master list of all who attended Quiz1
    q1 = pd.read_csv(q1path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
    q1.dropna(inplace=True)
    q1['Grade/10.00'] = q1['Grade/10.00'] * 10
    # Master list of all who attended Quiz2
    q2 = pd.read_csv(q2path, usecols = ['Firstname', 'Grade/10.00', 'State'], na_values = ['-', 'In progress', np.NaN])
    q2.dropna(inplace=True)
    q2['Grade/10.00'] = q2['Grade/10.00'] * 10

    # generate each batch file path and do other works
    for batchId in range(numberOfBatches-1):
        batchCsvPath = baseBatchPath + str(batchId+1) + ".csv"
        # Master list of students in Batch 9
        batch = pd.read_csv(batchCsvPath, usecols=['studentName', 'admissionNumber'])
        batch.rename(columns={'studentName' : 'Firstname'}, inplace=True)
        # Merge eachBatch list of names to list of quiz1 on their firstname column
        q1batch = pd.merge(batch, q1)
        q1batch = q1.loc[q1['Firstname'].isin(batch.Firstname)]        # checking if the name exits in either lists
        q1batch.reset_index(inplace=True)
        #print(q1batch)

        lt50 = q1batch.loc[(q1batch['Grade/10.00'] < 50)]         
        #findout list of students whose grades are lesser than 50
        outBatchq1 = (lt50['Grade/10.00'].count())
        # print(outBatchq1) to just get the count of number of students who got <50 quiz1 from batch -> batchId

        #do same for quiz 2

        # Merge each Batch to Q2
        q2batch = pd.merge(batch, q2)
        q2batch = q2.loc[q2['Firstname'].isin(batch.Firstname)]
        q2batch.reset_index(inplace=True)


        q2batch.loc[(q2batch['Grade/10.00'] <= 50)].count()
        lt50 = q2batch.loc[(q2batch['Grade/10.00'] < 50)]
        outBatchq2 = (lt50['Grade/10.00'].count())
        # print(outBatchq2)
        # finally save the resulted DF for later use
        batchResultListAll.append([q1batch, q2batch])


#call the function using base path and number of batch csv files        
doAll("d:\\\\a2\\\\batchwiselist\\\\", 18)

Answer 2

創建一個包含所有 CSV 文件路徑的列表對象，然后使用 for 循環解析所有這些。 顯然，您必須使用現在的動態file在 csv 文件中硬編碼的位置調整代碼，如下所示：

csv_files = ['file1.csv','file2.csv2']
for file in csv_files:
      (YOUR CODE GOES HERE)

如何編寫一個在 Python 中的 20 個不同 csv 文件上運行該函數的函數？

問題描述

2 個解決方案

解決方案1
2 已采納 2019-08-09 20:19:25

解決方案2
0 2019-08-09 12:43:23

如何編寫一個在 Python 中的 20 個不同 csv 文件上運行該函數的函數？

問題描述

2 個解決方案

解決方案1 2 已采納 2019-08-09 20:19:25

解決方案2 0 2019-08-09 12:43:23

解決方案1
2 已采納 2019-08-09 20:19:25

解決方案2
0 2019-08-09 12:43:23