简体   繁体   English

Python中的groupby是否会产生列或数据框?

[英]Does groupby in Python result in a column or dataframe?

Lets say I have a dataframe as 假设我有一个数据帧

age Late
1     1
2     5
3     48
4     46
5     6

... ...

I want to replace all values in Late that are 46 or 48 with the median of Late . 我想将Late中的所有值替换为46或48,其中间值为Late I believe the command is 我相信命令是

trainDF.groupby('Late').transform(getmedian)

however, is the result the whole dataframe trainDF ? 然而,整个数据帧trainDF的结果是什么? or is it just the Late column? 还是仅仅是Late专栏?

That is, does the below make sense? 也就是说,下面有意义吗?

trainDF=trainDF.groupby('Late').transform(getmedian)

Or does the below make sense? 或者以下有意义吗?

newLate = trainDF.groupby('Late').transform(getmedian)

I tried 我试过了

newLate = trainDF.groupby('Late').transform(getmedian)
newLate.max()

prints out Unnamed: 0 打印出Unnamed: 0

and trainDF['newLate'].max() trainDF['newLate'].max()

prints out KeyError: 'newLate' 打印出KeyError: 'newLate'

If I try 如果我试试

trainDF=trainDF.groupby('Late').transform(getmedian)

and print out 并打印出来

trainDF['Late'].max()

it says KeyError: 'Late' 它说KeyError: 'Late'

What am I supposed to do to store the new data frame that replaces the 46 and 48 values in Late ? 我应该做些什么来存储替换Late的46和48值的新数据框? I ask because I eventually want to make sure this works by printing out the max value of the modified Late column, and make sure it shows 6 (or any number less than 46), not 48 我问,因为我最终想通过打印修改的Late列的最大值来确保它的工作原理,并确保它显示6(或任何小于46的数字),而不是48

You'e asking a lot of questions here, but I'll address what seems to be the main one: 你在这里问了很多问题,但我会解决看起来很重要的问题:

I want to replace all values in Late that are 46 or 48 with the median of Late. 我想将Late中的所有值替换为46或48,其中间值为Late。

You can do that this way: 你可以这样做:

>>> df = pd.DataFrame({'age': [1,2,3,4,5],
                       'Late': [1,5,48,46,6]})
>>> df.loc[df['Late'].isin([46, 48]), 'Late'] = df['Late'].median()
>>> df
   Late  age
0     1    1
1     5    2
2     6    3
3     6    4
4     6    5

Here is a good one liner for you: 这是一个很好的一个班轮给你:

trainDF["Late"].loc[(trainDF["Late"] == 48) | (trainDF["Late"] == 46)] = trainDF["Late"].median()

Bear in mind that groupby didn't really apply in your case and that it returns a DataFrame -ish object 请记住,groupby并不真正适用于您的情况,并且它返回一个DataFrame -ish对象

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM