With the help of .loc method, I am identifying the values in a column in a Panda data frame based on values in the another column of the same data frame.
The code snippet is given below for your reference :
var1 = output_df['Player'].loc[output_df['Team']=='India'].reset_index(drop=True)
var2 = output_df['Player'].loc[output_df['Team']=='Australia'].reset_index(drop=True)
var3 = output_df['Player'].loc[output_df['Team']=='Algeria'].reset_index(drop=True)
Update
There may be 'n' number of teams in my data frame but I want the top players from selective teams only. That's why I manually enter the team names in the code. And I may require top performer, 2nd top performer and so on. So cannot fetch values from a column in data frame by using join statement.
Now I would be having 3 variables of type "pandas.core.series.Series"
I already sorted this data frame in the order of descending based on another column called "Score"
And my requirement is to fetch the top scoring player from each team and create an output variable combining all the player names with a ','.
I tried with the below command to get the desired output :
Final = var1[0]+','+var2[0]+','+var3[0]
It is producing the expected output successfully but suppose if any of the variable is empty - For example consider my data frame does not have top scoring player from Algeria, var3 will be empty. Hence when I execute the previous command, it is getting ended up with "Out of bounds" error
Is there any way to execute the previous command or is there any similar kind of command that has to ignore the null variable but combine the remaining variables together with a separator in between ?
Update
The logic that i get here will be used for framing sentences from words based on their POS tags (noun, adjective, verb, so on). Var1 will be used for storing Nouns arranged in descending order based on some score. Var2 will be used for storing Adjectives arranged in same order as noun and so on...
Finally while framing a string / sentence I would be using these variables to concatenate. Ex: top-performing-noun + top-performing-adjective + top-performing-verb. Second sentence will be formed by 2nd-top-performing-noun + 2nd-top-performing-adjective ..... Right now I do not have code snippet for the same. It is being framed from Team-Player code.
Hope this update helps to understand the question more clearly**
I think you need concat
with apply
for remove NaN
s by dropna
:
var1 = pd.Series(list('abcd'))
var2 = pd.Series(list('rftyru'))
var3 = pd.Series(list('de'))
print (pd.concat([var1, var2, var3], axis=1))
0 1 2
0 a r d
1 b f e
2 c t NaN
3 d y NaN
4 NaN r NaN
5 NaN u NaN
Final = (pd.concat([var1, var2, var3], axis=1)
.apply(lambda x: ', '.join(x.dropna()), axis=1))
print (Final)
0 a, r, d
1 b, f, e
2 c, t
3 d, y
4 r
5 u
dtype: object
But better is use groupby
with sort_values
and GroupBy.head
for top eg 2
players.
For filtering Teams
use boolean indexing
:
#a bit changed data from another solution
df = pd.DataFrame([['Tim', 'India', 100],
['Bob', 'Australia', 50],
['John', 'Algeria', 123],
['Sarah', 'Algeria', 456],
['Jane', 'Australia', 9]],
columns=["Player", "Team", "Score"])
df1 = df[df['Team'].isin(['Algeria','India','Australia'])]
df1 = df1.sort_values('Score', ascending=False).groupby('Team').head(2)
print (df1)
Player Team Score
3 Sarah Algeria 456
2 John Algeria 123
0 Tim India 100
1 Bob Australia 50
4 Jane Australia 9
df1 = (df.sort_values('Score', ascending=False)
.groupby('Team')['Player']
.apply(lambda x: ', '.join(x.head(2)))
.reset_index())
print (df1)
Team Player
0 Algeria Sarah, John
1 Australia Bob, Jane
2 India Tim
For second top use GroupBy.nth
:
df1 = df.sort_values('Score', ascending=False).groupby('Team', as_index=False).nth(1)
print (df1)
Player Team Score
2 John Algeria 123
4 Jane Australia 9
Instead of doing the filtering for each team, you can use the pandas groupby
function to do this, with some pre-filtering.
Since you want all the 1st players in each team, and all the 2nd, etc. in separate lists, you can rank players in each team, then groupby the rank.
So, first calculate the team rank, then filter to the teams you are interested in first, then group and concatenate the names together.
import pandas as pd
output_df = pd.DataFrame([['Tim', 'India', 100],
['Bob', 'Australia', 50],
['John', 'Algeria', 123],
['Sarah', 'Algeria', 456],
['Jane', 'Australia', 9],
['Humphrey', 'India', 200]],
columns=["Player", "Team", "Score"])
output_df['Team Rank'] = output_df.groupby("Team").rank(ascending=False)['Score'].astype(int)
interested_teams = output_df[output_df['Team'].isin(['India', 'Australia'])]
players_by_rank = interested_teams.groupby("Team Rank").apply(lambda x: ", ".join(x['Player']))
print(players_by_rank)
And get players by rank
Team Rank
1 Bob, Humphrey
2 Tim, Jane
You can get a specific rank by using .loc
. So for the second ranked players, use
players_by_rank.loc[2]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.