[英]Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column
[英]Pandas: find all unique values in one column and normalize all values in another column to their last value
我想在一列中找到所有唯一值,然后將另一列中的相應值歸一化為其最后一個值。 我想使用python3通過pandas模塊實現此目的。
例:
原始數據集
Fruit | Amount
Orange | 90
Orange | 80
Orange | 10
Apple | 100
Apple | 50
Orange | 20
Orange | 60 --> latest value of Orange. Use to normalize Orange
Apple | 75
Apple | 25
Apple | 40 --> latest value of Apple. Used to normalize Apple
所需的輸出
比率列,其中“水果”列中具有唯一值的歸一化值
Fruit | Amount | Ratio
Orange | 90 | 90/60 --> 150%
Orange | 80 | 80/60 --> 133.3%
Orange | 10 | 10/60 --> 16.7%
Apple | 100 | 100/40 --> 250%
Apple | 50 | 50/40 --> 125%
Orange | 20 | 20/60 --> 33.3%
Orange | 60 | 60/60 --> 100%
Apple | 75 | 75/40 --> 187.5%
Apple | 25 | 25/40 --> 62.5%
Apple | 40 | 40/40 --> 100%
Python代碼嘗試
import pandas as pd
filename = r'C:\fruitdata.dat'
df = pd.read_csv(filename, delimiter='|')
print(df)
print(df.loc[df['Fruit '] == 'Orange '])
print(df[df['Fruit '] == 'Orange '].tail(1))
Python輸出(IPython)
In [1]: df
Fruit Amount
0 Orange 90
1 Orange 80
2 Orange 10
3 Apple 100
4 Apple 50
5 Orange 20
6 Orange 60
7 Apple 75
8 Apple 25
9 Apple 40
In [2]: df.loc[df['Fruit '] == 'Orange ']
Fruit Amount
0 Orange 90
1 Orange 80
2 Orange 10
5 Orange 20
6 Orange 60
In [3]: df[df['Fruit '] == 'Orange '].tail(1)
Out[3]:
Fruit Amount
6 Orange 60
題
如何遍歷“水果”中的每個唯一項目,並針對其尾值對所有值進行歸一化?
您可以將iloc
與groupby
iloc
使用
df.groupby('Fruit').Amount.apply(lambda x: x/x.iloc[-1])
Out[38]:
0 1.500000
1 1.333333
2 0.166667
3 2.500000
4 1.250000
5 0.333333
6 1.000000
7 1.875000
8 0.625000
9 1.000000
Name: Amount, dtype: float64
分配回去之后
df['New']=df.groupby('Fruit').Amount.apply(lambda x: x/x.iloc[-1])
df
Out[40]:
Fruit Amount New
0 Orange 90 1.500000
1 Orange 80 1.333333
2 Orange 10 0.166667
3 Apple 100 2.500000
4 Apple 50 1.250000
5 Orange 20 0.333333
6 Orange 60 1.000000
7 Apple 75 1.875000
8 Apple 25 0.625000
9 Apple 40 1.000000
不使用lambda
df.Amount/df.groupby('Fruit',sort=False).Amount.transform('last')
Out[46]:
0 1.500000
1 1.333333
2 0.166667
3 2.500000
4 1.250000
5 0.333333
6 1.000000
7 1.875000
8 0.625000
9 1.000000
Name: Amount, dtype: float64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.