Why isn't "sort_values" working properly?

Question

I'm trying to print values in target_playlist . The problem is that I want to order values in target_playlist by percentuali column and I used target_playlist.sort_values('percentuali', inplace=True, ascending=False) Before the sort_values function, the result of:

print("{}".format(target_playlist['percentuali'][i]))

are:

0.7010264012452779
0.19662758090847976
0.6508863154849628
0.557740362863367
0.47418798688188313
0.6634307395184526
0.17661982395954637
0.6334661569944786
0.5226247859195567
0.37647399781797003
0.6107562358792401
0.10866013071895426
0.6259167928556538
0.5107723732317271
0.5107723732317271
0.440188723891383
0.473270990299173
0.5807994015581672
0.45540535868625753
0.4156854080449265
0.5659237264842225
0.5942257114281826
0.5763053500588216
0.43676171660260443
0.6947640279542424
0.37155299947773396
0.6055124707313475
0.6642522917728619
0.6339323841512609
0.6836084778718268
0.4585485761594801
0.7687767193517359
0.7739306342996543
0.6792746883779797
0.5688985142793829
0.5763507447689178
0.6265388222033668
0.5262211637961803
0.631776719351736
0.7016345319242638
0.6549247063300238
0.6218895455057429
0.3926510809451985
0.5081035167373568
0.6149459682682933
0.44069739392952245
0.46799465192894985
0.69161263493496
0.5534053586862575
0.6968509819258842
0.4988988577428972
0.5059165111353879
0.7355655050414504
0.6792746883779797
0.4401208506283063
0.49320548887003335
0.5112768045242271
0.7361528565218765
0.2329438202247191
0.6123902228073447
0.49864712823852325
0.6909989415739581
0.6754433860184025
0.566520509644565
0.37663089180304893
0.6529677236233883
0.6089596366830047
0.7687767193517359
0.6101347817993262
0.7559795411177228

While, when I print values after I have called sort_values , they are:

Titolo: Possibili Scenari,  Artista:  Cesare Cremonini,  Probabilita: 0.7559795411177228 
Titolo: Shallow,  Artista:  Lady Gaga,  Probabilita: 0.7559795411177228 
Titolo: To the Trees,  Artista:  An Early Bird,  Probabilita: 0.7559795411177228 
Titolo: If You Wanna Love Somebody - Acoustic,  Artista:  Tom Odell,  Probabilita: 0.7559795411177228 
Titolo: Happier - Acoustic,  Artista:  Ed Sheeran,  Probabilita: 0.7559795411177228 
Titolo: Lie With Me,  Artista:  Josiah and the Bonnevilles,  Probabilita: 0.7559795411177228 
Titolo: Jubilee Road,  Artista:  Tom Odell,  Probabilita: 0.7559795411177228 
Titolo: I'll Never Love Again - Film Version,  Artista:  Lady Gaga,  Probabilita: 0.7559795411177228 
Titolo: Rise - Acoustic,  Artista:  Jonas Blue,  Probabilita: 0.7559795411177228 
Titolo: Hold My Girl,  Artista:  George Ezra,  Probabilita: 0.7559795411177228 
Titolo: Love Someone,  Artista:  Lukas Graham,  Probabilita: 0.7559795411177228 
Titolo: Angels,  Artista:  Tom Walker,  Probabilita: 0.7559795411177228 
Titolo: These Days (feat. Jess Glynne, Macklemore & Dan Caplen) - Acoustic,  Artista:  Rudimental,  Probabilita: 0.7559795411177228 
Titolo: Just For Tonight - Acoustic,  Artista:  James Bay,  Probabilita: 0.7559795411177228 
Titolo: Perfect,  Artista:  Ed Sheeran,  Probabilita: 0.7559795411177228 
Titolo: No Roots,  Artista:  Joshua Hyslop,  Probabilita: 0.7559795411177228 
Titolo: Slide,  Artista:  James Bay,  Probabilita: 0.7559795411177228 
Titolo: Be Your Man,  Artista:  Rhys Lewis,  Probabilita: 0.7559795411177228 
Titolo: No Matter What,  Artista:  Calum Scott,  Probabilita: 0.7559795411177228 
Titolo: Woes,  Artista:  Tom Rosenthal,  Probabilita: 0.7559795411177228 
Titolo: Barbed Wire (Acoustic),  Artista:  Tom Grennan,  Probabilita: 0.7559795411177228 
Titolo: Stay Awake with Me,  Artista:  Dan Owen,  Probabilita: 0.7559795411177228 
Titolo: Spent So Long,  Artista:  Jamie Harrison,  Probabilita: 0.7559795411177228 
Titolo: Tummy,  Artista:  Tamino,  Probabilita: 0.7559795411177228 
Titolo: LOVISA,  Artista:  FELIX SANDMAN,  Probabilita: 0.7559795411177228 
Titolo: Girl - Acoustic,  Artista:  SYML,  Probabilita: 0.7559795411177228 
Titolo: Party Of One (feat. Sam Smith),  Artista:  Brandi Carlile,  Probabilita: 0.7559795411177228 
Titolo: Electricity - Acoustic,  Artista:  Silk City,  Probabilita: 0.7559795411177228 
Titolo: Leftovers,  Artista:  Dennis Lloyd,  Probabilita: 0.7559795411177228 
Titolo: Hand That You Hold,  Artista:  Dan Owen,  Probabilita: 0.7559795411177228 
Titolo: Company (feat. Molly Hammar),  Artista:  Paul Rey,  Probabilita: 0.7559795411177228 
Titolo: Too Good At Goodbyes - Edit,  Artista:  Sam Smith,  Probabilita: 0.7559795411177228 
Titolo: Need You Now - Acoustic,  Artista:  Dean Lewis,  Probabilita: 0.7559795411177228 
Titolo: Such A Simple Thing,  Artista:  Ray LaMontagne,  Probabilita: 0.7559795411177228 
Titolo: Acoustic,  Artista:  Billy Raffoul,  Probabilita: 0.7559795411177228 
Titolo: Don’t Matter To Me,  Artista:  Drake,  Probabilita: 0.7559795411177228 
Titolo: when the party's over,  Artista:  Billie Eilish,  Probabilita: 0.7559795411177228 
Titolo: Someone You Loved,  Artista:  Lewis Capaldi,  Probabilita: 0.7559795411177228 
Titolo: Collide,  Artista:  Tom Speight,  Probabilita: 0.7559795411177228 
Titolo: Fading Into Grey - Acoustic,  Artista:  Billy Lockett,  Probabilita: 0.7559795411177228 
Titolo: Never Let You Go (feat. John Newman) - Acoustic Version,  Artista:  Kygo,  Probabilita: 0.7559795411177228 
Titolo: T-Shirts,  Artista:  James Smith,  Probabilita: 0.7559795411177228 
Titolo: In My Head,  Artista:  Peter Manos,  Probabilita: 0.7559795411177228 
Titolo: Where Were You In The Morning?,  Artista:  Shawn Mendes,  Probabilita: 0.7559795411177228 
Titolo: come out and play,  Artista:  Billie Eilish,  Probabilita: 0.7559795411177228 
Titolo: Tear Me Down,  Artista:  Paul Rey,  Probabilita: 0.7559795411177228 
Titolo: Come As You Are,  Artista:  Imaginary Future,  Probabilita: 0.7559795411177228 
Titolo: Consequences - orchestra,  Artista:  Camila Cabello,  Probabilita: 0.7559795411177228 
Titolo: All I Am - Acoustic,  Artista:  Jess Glynne,  Probabilita: 0.7559795411177228

This is the part of program I'm working on

import tkinter as tk                
from tkinter import font  as tkfont 
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import spotipy
import spotipy.util as util
from numpy import integer
from tkinter import Radiobutton
sp = spotipy.Spotify() 
from spotipy.oauth2 import SpotifyClientCredentials 
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.cluster import KMeans
import itertools
import threading
import time
import sys
from operator import itemgetter, attrgetter, methodcaller

 target_playlist = pd.DataFrame(newPlaylist_features)

    if(algoritmo_scelto==1):
        pred = c.predict(target_playlist[features])
        p = c.predict_proba(target_playlist[features])
    if(algoritmo_scelto==2):
        pred = knn.predict(target_playlist[features])
        p = knn.predict_proba(target_playlist[features])
    if(algoritmo_scelto==3):
        pred = forest.predict(target_playlist[features])
        p = forest.predict_proba(target_playlist[features])
    if(algoritmo_scelto==4):
        pred = k_means.predict(target_playlist[features])
        p = k_means.predict_proba(target_playlist[features])

    likedSongs = 0
    i = 0

    for prediction in pred:
        target_playlist['percentuali'] = p[i][1]
        print("{}".format(target_playlist['percentuali'][i]))
        i = i +1


    target_playlist.sort_values('percentuali', inplace=True, ascending=False)

    i=0
    for prediction in pred:

        if(prediction == 1):
            print ("Titolo: " + target_playlist["song_title"][i] + ",  Artista:  "+ target_playlist["artist"][i] + ",  Probabilita: {} ".format(target_playlist["percentuali"][i]))
            likedSongs= likedSongs + 1
        i = i +1

Where am I wrong?

Answer 1

In this loop, you are setting the "target_playlist['percentuali']" Series to a single value:

i = 0

for prediction in pred:
    target_playlist['percentuali'] = p[i][1]
    print("{}".format(target_playlist['percentuali'][i]))
    i = i +1

Since "target_playlist['percentuali'] = p[i][1]" applies "p[i][1]" as the value to every row.

As show in this example:

>>> for i in [0, 1, 2]:
...     print(i)
...     df['this'] = i
...
0
1
2
>>> df
   id   col_1  col_2  col_3  this
0   1    blue     15   True    2
1   2     red     25  False    2
2   3  orange     35  False    2
3   4  yellow     24   True    2
4   5   green     12   True    2

Fix:

I don't know the object p but you should turn the results into a pd.Series . You can revise that whole loop to something like this:

target_playlist['percentuali'] = pd.Series(item[1] for item in p)
print(target_playlist['percentuali'])

After you have called sort_values on your DataFrame your values won't print in descending order since you are referencing the rows by the index eg (0, 1, 2) .

You can do a quick fix by resetting the index, see my example below:

>>> df.sort_values('col_2', inplace=True, ascending=False)
>>> df
   id   col_1  col_2  col_3
2   3  orange     35  False
1   2     red     25  False
3   4  yellow     24   True
0   1    blue     15   True
4   5   green     12   True
>>> df['col_2'][0]
15
>>> df.reset_index(inplace=True)
>>> df['col_2'][0]
35

Looping over dataframe rows

Instead of referencing by the index, you can loop through the rows like so:

for _, row in df.iterrows():
    print("Title: {}, Artist: {}, Probability: {}".format(
        row['song_title'], row['artist'], row['percentuali']
    ))

Answer 2

Besides the issue pointed out by foxy , it could very well be that the largest elements all have the same probability, since class 1 will be assigned to all the elements with probability larger than a given threshold. If you remove the if prediction == 1 you will see all the predictions with decreasing probability.

Also, you have a bug in your code:

i=0
for prediction in pred:
    if(prediction == 1):
        print ("Titolo: " + target_playlist["song_title"][i] + ",  Artista:  "+ target_playlist["artist"][i] + ",  Probabilita: {} ".format(target_playlist["percentuali"][i]))
        likedSongs= likedSongs + 1
    i = i +1   # this should be indented inside the if

You can easily avoid this kind of mistakes with enumerate :

for i, prediction in enumerate(pred):
    # now i is incremented automatically

Why isn't "sort_values" working properly?

Question

2 answers

solution1
1 ACCPTED 2019-01-09 15:35:17

Fix:

Looping over dataframe rows

solution2
1 2019-01-09 15:36:05

Why isn't "sort_values" working properly?

Question

2 answers

solution1 1 ACCPTED 2019-01-09 15:35:17

Fix:

Looping over dataframe rows

solution2 1 2019-01-09 15:36:05

solution1
1 ACCPTED 2019-01-09 15:35:17

solution2
1 2019-01-09 15:36:05