简体   繁体   中英

modifying the dataframe column and get unexpected results

I have a dataframe listed like below:

在此处输入图像描述

There are actually 120000 rows in this data, and there are 20000 users, this is just one user. For every user I need to make sure the prediction is three "1" and three "0".

I wrote the following function to do that:

def check_prediction_quality(df):
    df_n = df.copy()
    unique = df_n['userID'].unique()
    for i in range(len(unique)):
        ex_df = df[df['userID']== unique[i]]
        v = ex_df['prediction'].tolist()
        v_bool = [i == 0 for i in v]

        if sum(v_bool) != 3:
            if sum(v_bool) > 3:
                res = [i for i,val in enumerate(v_bool) if val]
                diff = sum(v_bool) - 3
                for i in range(diff):
                    idx = np.random.choice(res,1)[0]
                    v[idx] = float(1)
                    res.remove(idx)
            elif sum(v_bool) < 3:
                res = [i for i,val in enumerate(v_bool) if not val]
                diff = 3 - sum(v_bool)
                for i in range(diff):
                    idx = np.random.choice(res,1)[0]
                    v[idx] = float(0)
                    res.remove(idx)
        
        for j in range(len(v)):
            df_n.loc[(0+i*6)+j:(6+i*6)+j,'prediction'] = v[j]
    return df_n

However, when I run to check if the number of "0" and "1" are the same, turns it's not.. I am not sure what I did wrong.

sum([i == 0 for i in df['prediction']]) 

should be six using the below example, but when I run on my 120000 dataframe, it does not have 60000 on each

data = {'userID': [199810,199810,199810,199810,199810,199810,199812,199812,199812,199812,199812,199812],
'trackID':[1,2,3,4,5,6,7,8,9,10,11,12], 
'prediction':[0,0,0,0,1,1,1,1,1,1,0,0]
}
df = pd.DataFrame(data = data)
df

Much appreciated!

When working with pandas dataframes you should reassign the post-processed Dataframe to the old one.

df = pd.DataFrame(np.array(...))
#reasignation:
df.loc[:,3:5] = df.loc[:,3:5]*10 #This multiplies the columns from 3 to 5 by 10

Actually never mind. I found out I don't have to modify the "0" and "1"..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM