![](/img/trans.png)
[英]Create a new column in a dataframe with increment number based on another column
[英]Create new column with largest number indexes based on values of another column
我有一個 DataFrame 有兩列:“商品名稱”和“總銷售額”。 我需要創建另一列,其中包含從 1、2、3 中計算得出的銷售額最大的索引...其中 1 是最大數字,2 第二大數字,依此類推。
希望您能夠幫助我。
我的 dataframe:
lst = [['Keyboard1', 1860], ['Keyboard2', 1650], ['Keyboard3', 900], ['Keyboard4', 1230], ['Keyboard5', 1150], ['Keyboard6', 1345],
['Mouse1', 3100], ['Mouse2', 2900], ['Mouse3', 3050], ['Mouse4', 2750], ['Mouse5', 4100], ['Mouse6', 3910]]
df = pd.DataFrame(lst, columns = ['Goods', 'Sales'])
Goods Sales
0 Keyboard1 1860
1 Keyboard2 1650
2 Keyboard3 900
3 Keyboard4 1230
4 Keyboard5 1150
5 Keyboard6 1345
6 Mouse1 3100
7 Mouse2 2900
8 Mouse3 3050
9 Mouse4 2750
10 Mouse5 4100
11 Mouse6 3910
我正在嘗試使用此代碼:
import pandas as pd
import numpy as np
df = df.sort_values('Sales', ascending = False)
df['Largest'] = np.arange(len(df))+1
但是我得到了所有商品的最大值索引,我需要分別獲取每種商品的最大值索引。 我的結果:
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
1 Keyboard2 1860 7
0 Keyboard1 1650 8
5 Keyboard6 1345 9
3 Keyboard4 1230 10
4 Keyboard5 1150 11
2 Keyboard3 900 12
這是我需要的 output:
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
1 Keyboard2 1860 1
0 Keyboard1 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6
做就是了:
# remove any number of groups at the end
df['goods_group'] = df['Goods'].str.replace('\d+$', '')
# sort by the new column and sales
df = df.sort_values(['goods_group', 'Sales'], ascending=False)
# create largest column
df['largest'] = df.groupby('goods_group').cumcount() + 1
# drop the new column
res = df.drop('goods_group', 1)
print(res)
Output
Goods Sales largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
0 Keyboard1 1860 1
1 Keyboard2 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6
您可以groupby
,沒有數字的Goods
:
>>> df = df.sort_values('Sales', ascending=False)
>>> df
Goods Sales
10 Mouse5 4100
11 Mouse6 3910
6 Mouse1 3100
8 Mouse3 3050
7 Mouse2 2900
9 Mouse4 2750
0 Keyboard1 1860
1 Keyboard2 1650
5 Keyboard6 1345
3 Keyboard4 1230
4 Keyboard5 1150
2 Keyboard3 900
>>> df['Largest'] = df.groupby(df['Goods'].replace('\d+', '', regex=True)).cumcount() + 1
>>> df
Goods Sales Largest
10 Mouse5 4100 1
11 Mouse6 3910 2
6 Mouse1 3100 3
8 Mouse3 3050 4
7 Mouse2 2900 5
9 Mouse4 2750 6
0 Keyboard1 1860 1
1 Keyboard2 1650 2
5 Keyboard6 1345 3
3 Keyboard4 1230 4
4 Keyboard5 1150 5
2 Keyboard3 900 6
嘗試將這些行添加到代碼的末尾:
df['new'] = df['Goods'].str[:-1]
df['Largest'] = df.groupby('new').cumcount() + 1
df = df.drop('new', axis=1)
print(df)
Output:
Goods Sales new Largest
10 Mouse5 4100 Mouse 1
11 Mouse6 3910 Mouse 2
6 Mouse1 3100 Mouse 3
8 Mouse3 3050 Mouse 4
7 Mouse2 2900 Mouse 5
9 Mouse4 2750 Mouse 6
0 Keyboard1 1860 Keyboard 1
1 Keyboard2 1650 Keyboard 2
5 Keyboard6 1345 Keyboard 3
3 Keyboard4 1230 Keyboard 4
4 Keyboard5 1150 Keyboard 5
2 Keyboard3 900 Keyboard 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.