简体   繁体   English

Python Pandas DataFrameivot_table奇怪的值

[英]Python Pandas DataFrame pivot_table bizarre values

"Bizarre" is such an emotionally charged word. “ Bizarre”是一个充满情感的词。

Assume that I have 5 students: A, B, C, D, and E. Each of these students grades two of their peers on a writing assignment. 假设我有5个学生:A,B,C,D和E。这些学生中的每一个在写作作业中都给他们的两个同伴评分。 The data is as follows: 数据如下:

peer_review = pd.DataFrame({
    'Student': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'E', 'E'],
    'Assessor': ['B', 'C', 'A', 'D', 'D', 'D', 'B', 'D', 'D', 'D', 'A', 'A', 'A', 'E', 'C', 'E'],
    'Score': [72, 53, 92, 100, 2, 90, 75, 50, 50, 47, 97, 86, 41, 17, 47, 29]})

Now, in some cases an assessor graded the student's assignment more than once. 现在,在某些情况下,评估者会多次评估学生的作业。 Maybe the student turned it in and revised several times. 也许学生把它翻了个面,并做了几次修改。 Maybe the assessor was drunk and didn't remember that he had already graded this student's assignment. 评估者可能喝醉了,不记得他已经对该学生的作业进行了评分。 In any case, I would like to be able to see a list of all scores that each assessor gave to each student. 无论如何,我希望能够看到每个评估者给每个学生的所有分数的列表。 I tried to do this as follows: 我尝试这样做,如下所示:

peer_review.pivot_table(
    index='Student',
    columns='Assessor',
    values='Score',
    aggfunc=identity)

I can already hear you asking --- What is the "identity" function? 我已经听到您在问---什么是“身份”功能? It's this: 是这样的:

def identity(x):
    return x

However, when I run this the pivot_table function repeatedly, it gives me different answers each time for the cells that have multiple values. 但是,当我重复运行pivot_table函数时,对于具有多个值的单元格,每次都会给我不同的答案。

So, here are the questions: 因此,这里有一些问题:

  1. What is the significance of the numbers that seem to change randomly as I run the pivot_table function repeatedly? 当我反复运行pivot_table函数时,似乎随机变化的数字有什么意义?
  2. How do I fix the identity function so that it returns a simple list of all the scores when an assessor graded the same assignment more than once? 当评估者多次对同一个作业评分时,我该如何修正身份功能,以便返回所有分数的简单列表?

------------------UPDATE #1:------------------ ------------------更新#1:------------------

I found that it is a pandas Series object that is being passed to the identity function. 我发现这是一个传递给身份函数的pandas Series对象。 I changed the identity function to this: 我将身份功能更改为此:

def identity(x):
    return x.values

This still gives me the bizarre random numbers. 这仍然给了我奇怪的随机数。 Realizing that x.values is a numpy.ndarray, I then tried this: 意识到x.values是一个numpy.ndarray,然后我尝试了一下:

def identity(x):
    return x.values.tolist()

This results in a ValueError exception. 这将导致ValueError异常。 ("Function does not reduce.") (“功能不会减少。”)

------------------UPDATE #2:------------------ ------------------更新#2:------------------

The workaround proposed by ZJS works perfectly. ZJS提出的解决方法非常有效。 Still wondering why pivot_table has failed me. 仍然想知道为什么pivot_table让我失败了。

This will work every time... 每次都可以使用...

groups = peer_review.groupby(['Assessor','Student'])  #groups into Assessor,Student combos
peer_review = groups.apply(lambda x:list(x['Score'])) #apply your group function
peer_review  =peer_review.unstack('Student')          #Set student index as the columns

I'm still investigating why pivot_table doesn't work 我仍在调查为什么pivot_table不起作用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM