简体   繁体   English

从带有条件的 dataframe 中选择列

[英]Selecting columns from a dataframe with conditions

I was learning to use pandas library today and I came across this error that I couldn't fully understand.我今天正在学习使用 pandas 库,遇到了这个我无法完全理解的错误。 This is the dataframe I was using.这是我使用的 dataframe。

      name   kda  combat  econ
0   Austin  1.45   270.0    67
8   Austin  1.70   300.0    90
4   Justin  1.36   230.0    50
11  Justin  1.50   270.0    60
1    Kevin  1.40   230.0    55
6    Kevin  1.00   100.0   120
3     Matt  1.00   180.0    65
9     Matt  1.40   280.0    70
2     Nick  2.10   360.0    87
7     Nick  2.50   340.0    88
5     Will  1.20   185.0    45
10    Will  1.60   260.0    75

I was trying to get name and kda columns who have average combat scores greater than 250 which I tried to achieve by doing我试图获得平均战斗分数大于 250 的名称和 kda 列,我试图通过这样做来实现

temp = df.groupby('name').mean()
temp = temp[temp['combat'] > 250]
print(temp['name', 'kda'])

but it returned this key error instead但它返回了这个关键错误

KeyError: "['name'] not in index"

Could someone explain why I can't grab columns from these temporary dataframes?有人可以解释为什么我不能从这些临时数据框中获取列吗? Or did I do something wrong in my code?还是我在代码中做错了什么? Luckily my friend helped me out and I could do it by幸运的是,我的朋友帮助了我,我可以做到

temp = df.loc[df['combat'] > 250, ['name','kda']]
print(temp.groupby('name').mean())

This did the trick to give这成功了

          kda
name         
Austin  1.575
Justin  1.500
Matt    1.400
Nick    2.300
Will    1.600

Thank you in advance先感谢您

When you do a groupby("col_name") , the default behaviour is for pandas to set the col_name as the index当您执行groupby("col_name")时,默认行为是 pandas 将col_name设置为索引

In your case, you can name will be set as the dataframe index在您的情况下,您可以将名称设置为 dataframe 索引

You can use您可以使用

temp = df.groupby('name').mean()
temp = temp[temp['combat'] > 250]
print(temp['kda'])

to get your desired result (it will return a Series)得到你想要的结果(它将返回一个系列)

Another option would be to use as_index=False with groupby另一种选择是将as_index=False与 groupby 一起使用

groupby('col_name', as_index=False)

This will return a dataframe with 'name' as a column and your first solution will work这将返回一个 dataframe 并以“名称”作为列,您的第一个解决方案将起作用

Have a look at the intermediate steps and you'll see what's going on看看中间步骤,你就会明白发生了什么

Alternative answer.替代答案。

.reset_index() can be used after .groupby() as in code below. .reset_index()可以在.groupby()之后使用,如下面的代码所示。 Also while printing you may need to add [[]] instead of [] if more than two columns need to be printed.此外,在打印时,如果需要打印多于两列,您可能需要添加[[]]而不是[]

# Import libraries
import pandas as pd

# Create DataFrame
df = pd.DataFrame({
    'name': ['Austin','Austin','Justin','Justin','Kevin','Kevin',
            'Matt','Matt','Nick','Nick','Will','Will'],
    'kda': [1.45,1.70,1.36,1.50,1.40,1.40,1.0,1.30,2.10,2.50,1.20,1.60],
    'combat':[270.0,300.0,230.0,270.0,230.0,100.0,180,280,360,340,185,260],
    'econ':[67,90,50,60,55,120,65,70,87,88,45,75]
})

# Groupby (copy pasted code from question and modified)
temp = df.groupby('name').mean().reset_index()
temp = temp[temp['combat'] > 250]
print(temp[['name', 'kda']])

Output Output

     name    kda
0  Austin  1.575
4    Nick  2.300

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM