简体   繁体   中英

Pandas : How to get mean of the column from the filtered values of other column?

 Name      Runs   Score         
 Richard   2000    2
 Richard   3000    3
 Richard   7000    8
 Richard   8000    7
 Richard   2000    4
 Richard   9000    7.5

lessthan1 = str(len(df[(df['Score']>=0) & (df['Score']<1)]))
oneto5 = str(len(df[(df['Score']>=1) & (df['Score']<5)]))
fiveto10 = str(len(df[(df['Score']>=5) & (df['Score']<10)]))

For example in category >=1 to <5..am expecting (2000+3000+2000)/31000

Looks like you can use pd.cut to bin the score data and use that to group the runs, eg:

df.groupby(pd.cut(df['Score'], [0, 1, 5, 10]))['Runs'].mean()

Gives you:

Score
(0, 1]             NaN
(1, 5]     2333.333333
(5, 10]    8000.000000
Name: Runs, dtype: float64

You could try using boolean masking as follows,

lessthan1 = df.loc[(df['Score'] >= 0) & (df['Score'] <1), 'Runs'].mean()
oneto5 = df.loc[(df['Score']>=1) & (df['Score']<5), 'Runs'].mean()
fiveto10 = df.loc[(df['Score']>=5) & (df['Score']<10), 'Runs'].mean()

print('Less than 1:', lessthan1)
print('One to Five:', oneto5)
print('Five to Ten:', fiveto10)
import pandas as pd
array = [
    ["Richard",   2000,    2],
    ["Richard",   3000,    3],
    ["Richard",   7000,    8],
    ["Richard",   8000,    7],
    ["Richard",   2000,    4],
    ["Richard",   9000,    7.5]]
df = pd.DataFrame(array, columns=["Name", "Runs", "Score"])
print(df[(df['Score'] >= 0) & (df['Score'] < 1)]["Runs"].mean())
print(df[(df['Score'] >= 1) & (df['Score'] < 5)]["Runs"].mean())
print(df[(df['Score'] >= 5) & (df['Score'] < 10)]["Runs"].mean())

nan

2333.3333333333335

8000.0

You need to select just the Runs column and then use the .mean() method on that after you implement your boolean selection

Also, you can use pandas apply functions with lambdas:

df.loc[df['Score'].apply(lambda x: x < 1 and x > 0), 'Runs'].mean()
df.loc[df['Score'].apply(lambda x: x >= 1 and x < 5), 'Runs'].mean()
df.loc[df['Score'].apply(lambda x: x >= 5 and x < 10), 'Runs'].mean()

Result:

5166.666666666667
nan
2333.3333333333335

To obtain mean averaged over total sum in score you have to change mean() by sum() and divide by total sum() . For example

df.loc[df['Score'].apply(lambda x: x >= 1 and x < 5), 'Runs'].sum()/df['Runs'].sum()

Result:

0.22580645161290322 = (2000 + 3000 + 2000)/31000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM