简体   繁体   English

argsort() 仅将正负值分开并添加一个新的 pandas 列

[英]argsort() only positive and negative values separately and add a new pandas column

I have a dataframe that has column, 'col', with both positive and negative numbers.我有一个 dataframe,它有列“col”,有正数和负数。 I would like run a ranking separately on both the positive and negative numbers only with 0 excluded not to mess up the ranking.我想分别对正数和负数进行排名,只排除 0,以免搞乱排名。 My issue is that my code below is updating the 'col' column.我的问题是我下面的代码正在更新“col”列。 I must be keeping a reference it but not sure where?我必须保留它的参考但不确定在哪里?

data = {'col':[random.randint(-1000, 1000) for _ in range(100)]}
df = pd.DataFrame(data)

pos_idx = np.where(df.col > 0)[0]
neg_idx = np.where(df.col < 0)[0]
p = df[df.col > 0].col.values
n = df[df.col < 0].col.values
p_rank = np.round(p.argsort().argsort()/(len(p)-1)*100,1)
n_rank = np.round((n*-1).argsort().argsort()/(len(n)-1)*100,1)
pc = df.col.values
pc[pc > 0] = p_rank
pc[pc < 0] = n_rank
df['ranking'] = pc

was able to figure it out on my own.能够自己弄清楚。

created a new column of zeros then used.loc to update te value at their respective index locations.创建了一个新的零列,然后使用 .loc 更新各自索引位置的 te 值。

df['ranking'] = 0 
df[df.col > 0, 'ranking'] = pos_rank
df[df.col < 0, 'ranking'] = neg_rank

One way to do it is to avoid mutating the original dataframe by replacing this line in your code:一种方法是通过替换代码中的这一行来避免改变原始 dataframe:

pc = df.col.values

with:和:

pc = df.copy().col.values

So that:以便:

print(df)
# Output
    col  ranking
0  -492       49
1   884       93
2  -355       36
3   741       77
4  -210       24
..  ...      ...
95  564       57
96  683       63
97 -129       18
98 -413       44
99  810       81

[100 rows x 2 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM