简体   繁体   中英

Python - How to combine / concatenate / join pandas series variables ignoring the empty variable

With the help of .loc method, I am identifying the values in a column in a Panda data frame based on values in the another column of the same data frame.

The code snippet is given below for your reference :

var1 = output_df['Player'].loc[output_df['Team']=='India'].reset_index(drop=True)
var2 = output_df['Player'].loc[output_df['Team']=='Australia'].reset_index(drop=True)
var3 = output_df['Player'].loc[output_df['Team']=='Algeria'].reset_index(drop=True)

Update

There may be 'n' number of teams in my data frame but I want the top players from selective teams only. That's why I manually enter the team names in the code. And I may require top performer, 2nd top performer and so on. So cannot fetch values from a column in data frame by using join statement.

Now I would be having 3 variables of type "pandas.core.series.Series"

I already sorted this data frame in the order of descending based on another column called "Score"

And my requirement is to fetch the top scoring player from each team and create an output variable combining all the player names with a ','.

I tried with the below command to get the desired output :

Final = var1[0]+','+var2[0]+','+var3[0]

It is producing the expected output successfully but suppose if any of the variable is empty - For example consider my data frame does not have top scoring player from Algeria, var3 will be empty. Hence when I execute the previous command, it is getting ended up with "Out of bounds" error

Is there any way to execute the previous command or is there any similar kind of command that has to ignore the null variable but combine the remaining variables together with a separator in between ?

Update

The logic that i get here will be used for framing sentences from words based on their POS tags (noun, adjective, verb, so on). Var1 will be used for storing Nouns arranged in descending order based on some score. Var2 will be used for storing Adjectives arranged in same order as noun and so on...

Finally while framing a string / sentence I would be using these variables to concatenate. Ex: top-performing-noun + top-performing-adjective + top-performing-verb. Second sentence will be formed by 2nd-top-performing-noun + 2nd-top-performing-adjective ..... Right now I do not have code snippet for the same. It is being framed from Team-Player code.

Hope this update helps to understand the question more clearly**

I think you need concat with apply for remove NaN s by dropna :

var1 = pd.Series(list('abcd'))
var2 = pd.Series(list('rftyru'))
var3 = pd.Series(list('de'))

print (pd.concat([var1, var2, var3], axis=1))

     0  1    2
0    a  r    d
1    b  f    e
2    c  t  NaN
3    d  y  NaN
4  NaN  r  NaN
5  NaN  u  NaN

Final = (pd.concat([var1, var2, var3], axis=1)
          .apply(lambda x: ', '.join(x.dropna()), axis=1))
print (Final)

0    a, r, d
1    b, f, e
2       c, t
3       d, y
4          r
5          u
dtype: object

But better is use groupby with sort_values and GroupBy.head for top eg 2 players.

For filtering Teams use boolean indexing :

#a bit changed data from another solution
df = pd.DataFrame([['Tim', 'India', 100],
                   ['Bob', 'Australia', 50],
                   ['John', 'Algeria', 123],
                   ['Sarah', 'Algeria', 456],
                   ['Jane', 'Australia', 9]],
                         columns=["Player", "Team", "Score"])


df1 = df[df['Team'].isin(['Algeria','India','Australia'])]
df1 = df1.sort_values('Score', ascending=False).groupby('Team').head(2)
print (df1)
  Player       Team  Score
3  Sarah    Algeria    456
2   John    Algeria    123
0    Tim      India    100
1    Bob  Australia     50
4   Jane  Australia      9

df1 = (df.sort_values('Score', ascending=False)
        .groupby('Team')['Player']
        .apply(lambda x: ', '.join(x.head(2)))
        .reset_index())
print (df1)

        Team       Player
0    Algeria  Sarah, John
1  Australia    Bob, Jane
2      India          Tim

For second top use GroupBy.nth :

df1 = df.sort_values('Score', ascending=False).groupby('Team', as_index=False).nth(1)
print (df1)
  Player       Team  Score
2   John    Algeria    123
4   Jane  Australia      9

Instead of doing the filtering for each team, you can use the pandas groupby function to do this, with some pre-filtering.

Since you want all the 1st players in each team, and all the 2nd, etc. in separate lists, you can rank players in each team, then groupby the rank.

So, first calculate the team rank, then filter to the teams you are interested in first, then group and concatenate the names together.

import pandas as pd
output_df = pd.DataFrame([['Tim', 'India', 100],
                          ['Bob', 'Australia', 50],
                          ['John', 'Algeria', 123],
                          ['Sarah', 'Algeria', 456],
                          ['Jane', 'Australia', 9],
                          ['Humphrey', 'India', 200]],
                         columns=["Player", "Team", "Score"])

output_df['Team Rank'] = output_df.groupby("Team").rank(ascending=False)['Score'].astype(int)

interested_teams = output_df[output_df['Team'].isin(['India', 'Australia'])]

players_by_rank = interested_teams.groupby("Team Rank").apply(lambda x: ", ".join(x['Player']))

print(players_by_rank)

And get players by rank

Team Rank
1    Bob, Humphrey
2        Tim, Jane

You can get a specific rank by using .loc . So for the second ranked players, use

players_by_rank.loc[2]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM