简体   繁体   English

从 Excel CSV 文件计算聚合值

[英]Calculate aggregated values from Excel CSV file

I have a finished Excel sheet with 20 rows of name, date of birth, class year (fresh, soph, junior, senior), GPA.我有一张完成的 Excel 表格,上面有 20 行姓名、出生日期、class 年份(新生、初中、初中、高中),GPA。

I need to write a program that calculates the following:我需要编写一个计算以下内容的程序:

  • Number of students per year每年学生人数
  • Average age per year平均年龄
  • Average GPA per year每年平均 GPA

Do you have any suggestions on how to achieve these results?您对如何实现这些结果有什么建议吗?

Stack Overflow is generally used to get answers for specific questions, rather than general programming advice. Stack Overflow 通常用于获取特定问题的答案,而不是一般的编程建议。

My high level advice is to read the data from the excel sheet, into a list of dictionaries.我的高级建议是将 excel 表中的数据读取到字典列表中。 From there you can iterate over the list of dictionaries to find the information you're looking for.从那里您可以遍历字典列表以查找您要查找的信息。

Alternatively you can use Pandas but that is generally used for large data sets and computationally expensive operations.或者,您可以使用Pandas ,但这通常用于大型数据集和计算量大的操作。

There are a few different questions here that are worth answering because this may be applicable to more people.这里有几个不同的问题值得回答,因为这可能适用于更多人。

Specifically, the questions are:具体来说,问题是:

  1. Read excel data in python读取 python 中的 excel 数据
  2. Calculate count based on a column根据列计算计数
  3. Calculate averages based on a column根据列计算平均值
  4. Calculate age based on date of birth (this will depend a bit on how you've formatted DOB)根据出生日期计算年龄(这将取决于您如何格式化 DOB)

1- Use the pandas module 1- 使用 pandas 模块

2-4* See below: 2-4* 见下文:

import pandas
import numpy

data = pandas.read_excel(r'D:\User Files\Downloads\73802797-file.xlsx')
print('\nExcel data')
print(data)

print('\nNumber of students per year')
#https://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column
print(data['Class Year'].value_counts())

print('\nAverage age per year')
#https://stackoverflow.com/questions/31490816/calculate-datetime-difference-in-years-months-etc-in-a-new-pandas-dataframe-c
data['Age'] = (pandas.Timestamp.now() - data['DOB']) / numpy.timedelta64(1, 'Y')
print(data['Age'])
print(data.groupby('Class Year')['Age'].mean())

print('\nAverage GPA per year')
#https://stackoverflow.com/questions/30482071/how-to-calculate-mean-values-grouped-on-another-column-in-pandas
print(data.groupby('Class Year')['GPA'].mean())

Running this produces the following output:运行它会产生以下 output:


Excel data
        Name        DOB Class Year  GPA
0  Redeye438 2008-09-22      Fresh    1
1  Redeye439 2009-09-20       Soph    2
2  Redeye440 2010-09-22     Junior    3
3  Redeye441 2011-09-20     Senior    4
4  Redeye442 2008-09-20      Fresh    4
5  Redeye443 2009-09-22       Soph    3

Number of students per year
Soph      2
Fresh     2
Junior    1
Senior    1
Name: Class Year, dtype: int64

Average age per year
Class Year
Fresh     14.000320
Junior    11.998910
Senior    11.005050
Soph      13.000984
Name: Age, dtype: float64

Average GPA per year
Class Year
Fresh     2.5
Junior    3.0
Senior    4.0
Soph      2.5
Name: GPA, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM