简体   繁体   English

遍历CSV文件并创建表

[英]Iterating through a csv file and creating a table

I'm trying to read in a .csv file and extract specific columns so that I can output a single table that essentially performs a 'GROUP BY' on a particular column and aggregates certain other columns of interest (similar to how you would in SQL) but I'm not too familiar how to do this easily in Python. 我正在尝试读取.csv文件并提取特定的列,以便我可以输出一个表,该表本质上对特定的列执行“ GROUP BY”并聚合其他感兴趣的列(类似于您在SQL中的操作方式),但我不太熟悉如何在Python中轻松地执行此操作。

The csv file is in the following form: csv文件的格式如下:

age,education,balance,approved
30,primary,1850,yes
54,secondary,800,no
24,tertiary,240,yes

I've tried to import and read in the csv files to parse the three columns I care about and iterate through them to put them into three separate array lists. 我尝试导入并读取csv文件以解析我关心的三列,并对其进行迭代以将它们放入三个单独的数组列表中。 I'm not too familiar with packages and how to get these into a data frame or matrix with 3 columns so that I can then iterate through them mutate or perform all of the aggregated output field (see below expected results). 我对包以及如何将它们放入3列的数据框或矩阵中不太熟悉,这样我就可以遍历它们进行变异或执行所有汇总的输出字段(请参见下面的预期结果)。

with open('loans.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter = ',')

    next(readCSV)  ##skips header row

    education = []
    balance = []
    loan_approved = []

    for row in readCSV:
        educat = row[1]
        bal = row[2]
        approve = row[3]

        education.append(educat)
        balance.append(bal)
        loan_approved.append(approve)

    print(education)
    print(balance)
    print(loan_approved)

The output would be a 4x7 table of four rows (grouped by education level) and the following headers: 输出将是一个四行的4x7表(按教育程度分组)和以下标头:

Education|#Applicants|Min Bal|Max Bal|#Approved|#Rejected|%Apps Approved
Primary  ...
Secondary  ...
Terciary ...

It seems to be much simpler by using Pandas instead. 改为使用Pandas似乎要简单得多。 For instance, you can read only the columns that you care for instead of all of them: 例如,您只能读取您要关注的列,而不是全部:

import Pandas as pd

df = pd.read_csv(usecols=['education', 'balance', 'loan_approved'])

Now, to group by education level, you can find all the unique entries for that column and group them: 现在,要按教育程度分组,您可以找到该列的所有唯一条目并将其分组:

groupby_education = {}
for level in list(set(df['education'])):
    groupby_education[level] = df.loc[df['education'] == level]

print(groupby_education)

I hope this helped. 希望对您有所帮助。 Let me know if you still need help. 让我知道您是否仍然需要帮助。 Cheers! 干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM