简体   繁体   English

在 python/pandas dataframe 中索引不同值的实例

[英]Indexing instances of different values in a python/pandas dataframe

I'm fairly new to python and I'm trying to automate a task to count instances of multiple values (student absence types) and then spit them back out on a single line per student.我对 python 相当陌生,我正在尝试自动执行一项任务来计算多个值的实例(学生缺勤类型),然后将它们吐回到每个学生的一行上。 If I have a single value, I can accomplish that by:如果我只有一个值,我可以通过以下方式实现:

import pandas as pd
df = pd.read_csv('attendanceUAnumbersLISTONLY.csv', header=0)
  
nf=df['StudentId'].value_counts()
print(nf)
nf.to_csv('studentua.csv', index=True, header=False)

The dataframe I'm pulling is a cognos report that simply shows a student ID number for each instance of an unexcused absence.我要提取的 dataframe 是一份 cognos 报告,它仅显示每个无故缺勤实例的学生 ID 号。 The underlying dataset looks like:底层数据集如下所示:

StudentID学生卡 AbsenceType缺席类型
123456 123456 UA UA
123456 123456 UA UA
654321 654321 UA UA

I ultimately want the output to be:我最终希望 output 是:

StudentID学生卡 Count数数
123456 123456 2 2
654321 654321 1 1

That code above will do that.上面的代码将做到这一点。 But if I want to pull values besides UA and put those into a different column of the output, that's where I'm stuck.但是,如果我想提取除 UA 之外的值并将它们放入 output 的不同列中,那就是我卡住的地方。 So if I have values of P (present), I want to export them out in a new column that I can import into another system.因此,如果我有 P(存在)的值,我想将它们导出到一个新列中,我可以将其导入另一个系统。

StudentID学生卡 UA UA P
123456 123456 2 2 7 7
654321 654321 1 1 8 8

I can't get my head around how to do that.我无法理解如何做到这一点。

What about first groupby using both columns, then size to get the number of occurences, unstack to pivot the level index level of AbsenceType and fillna to fill where no occurences are found:第一个groupby使用两列怎么样,然后size得到出现的数量, unstack到 pivot 的级别索引级别AbsenceTypefillna来填充没有发现出现的地方:

df.groupby(['StudentId', 'AbsenceType']).size().unstack(level=1).fillna(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM