[英]Selecting columns from a dataframe with conditions
I was learning to use pandas library today and I came across this error that I couldn't fully understand.我今天正在学习使用 pandas 库,遇到了这个我无法完全理解的错误。 This is the dataframe I was using.这是我使用的 dataframe。
name kda combat econ
0 Austin 1.45 270.0 67
8 Austin 1.70 300.0 90
4 Justin 1.36 230.0 50
11 Justin 1.50 270.0 60
1 Kevin 1.40 230.0 55
6 Kevin 1.00 100.0 120
3 Matt 1.00 180.0 65
9 Matt 1.40 280.0 70
2 Nick 2.10 360.0 87
7 Nick 2.50 340.0 88
5 Will 1.20 185.0 45
10 Will 1.60 260.0 75
I was trying to get name and kda columns who have average combat scores greater than 250 which I tried to achieve by doing我试图获得平均战斗分数大于 250 的名称和 kda 列,我试图通过这样做来实现
temp = df.groupby('name').mean()
temp = temp[temp['combat'] > 250]
print(temp['name', 'kda'])
but it returned this key error instead但它返回了这个关键错误
KeyError: "['name'] not in index"
Could someone explain why I can't grab columns from these temporary dataframes?有人可以解释为什么我不能从这些临时数据框中获取列吗? Or did I do something wrong in my code?还是我在代码中做错了什么? Luckily my friend helped me out and I could do it by幸运的是,我的朋友帮助了我,我可以做到
temp = df.loc[df['combat'] > 250, ['name','kda']]
print(temp.groupby('name').mean())
This did the trick to give这成功了
kda
name
Austin 1.575
Justin 1.500
Matt 1.400
Nick 2.300
Will 1.600
Thank you in advance先感谢您
When you do a groupby("col_name")
, the default behaviour is for pandas to set the col_name
as the index当您执行groupby("col_name")
时,默认行为是 pandas 将col_name
设置为索引
In your case, you can name will be set as the dataframe index在您的情况下,您可以将名称设置为 dataframe 索引
You can use您可以使用
temp = df.groupby('name').mean()
temp = temp[temp['combat'] > 250]
print(temp['kda'])
to get your desired result (it will return a Series)得到你想要的结果(它将返回一个系列)
Another option would be to use as_index=False
with groupby另一种选择是将as_index=False
与 groupby 一起使用
groupby('col_name', as_index=False)
This will return a dataframe with 'name' as a column and your first solution will work这将返回一个 dataframe 并以“名称”作为列,您的第一个解决方案将起作用
Have a look at the intermediate steps and you'll see what's going on看看中间步骤,你就会明白发生了什么
Alternative answer.替代答案。
.reset_index()
can be used after .groupby()
as in code below. .reset_index()
可以在.groupby()
之后使用,如下面的代码所示。 Also while printing you may need to add [[]]
instead of []
if more than two columns need to be printed.此外,在打印时,如果需要打印多于两列,您可能需要添加[[]]
而不是[]
。
# Import libraries
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'name': ['Austin','Austin','Justin','Justin','Kevin','Kevin',
'Matt','Matt','Nick','Nick','Will','Will'],
'kda': [1.45,1.70,1.36,1.50,1.40,1.40,1.0,1.30,2.10,2.50,1.20,1.60],
'combat':[270.0,300.0,230.0,270.0,230.0,100.0,180,280,360,340,185,260],
'econ':[67,90,50,60,55,120,65,70,87,88,45,75]
})
# Groupby (copy pasted code from question and modified)
temp = df.groupby('name').mean().reset_index()
temp = temp[temp['combat'] > 250]
print(temp[['name', 'kda']])
Output Output
name kda
0 Austin 1.575
4 Nick 2.300
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.